May 17th AZORA Meetup – Last until September!

https://www.meetup.com/Arizona-Oracle-User-Group/events/260013419/

This is our last Meetup until our Summer break. Come check out two great presentations.

Doug Hood from Oracle will talk about the Oracle In-Memory database feature. We appreciate Oracle providing us with this technical content to support AZORA.

AZORA’s own Stephen Andert will be sharing a non-technical presentation on Networking. He just gave the same talk at the national Oracle user group meeting called Collaborate 19 so it will be great to have him share with his local user group.

Looking forward to seeing you there.

Bobby

P.S. AZORA is the Arizona Oracle User Group, and we meet in the Phoenix, Arizona area.

Posted in Uncategorized | Leave a comment

Example of coe_xfr_sql_profile force_match TRUE

Monday, I used the coe_xfr_sql_profile.sql script from Oracle Support’s SQLT scripts to resolve a performance issue. I had to set the parameter force_match to TRUE so that the SQL Profile I created would apply to all SQL statements with the same FORCE_MATCHING_SIGNATURE value.

I just finished going off the on-call rotation at 8 am Monday and around 4 pm on Monday a coworker came up to me with a performance problem. A PeopleSoft Financials job was running longer than it normally did. Since it had run for several hours, I got an AWR report of the last hour and looked at the SQL ordered by Elapsed Time section and found a number of similar INSERT statements with different SQL_IDs.

The inserts were the same except for certain constant values. So, I used my fmsstat2.sql script with ss.sql_id = ’60dp9r760ja88′ to get the FORCE_MATCHING_SIGNATURE value for these inserts. Here is the output:

Now that I had the FORCE_MATCHING_SIGNATURE value 5442820596869317879 I reran fmsstat2.sql with ss.FORCE_MATCHING_SIGNATURE = 5442820596869317879 instead of ss.sql_id = ’60dp9r760ja88′ and got all of the insert statements and their PLAN_HASH_VALUE values. I needed these to use coe_xfr_sql_profile.sql to generate a script to create a SQL Profile to force a better plan onto the insert statements. Here is the beginning of the output of the fmsstat2.sql script:

The first few lines show the good plan that these inserts ran on earlier runs. The good plan has PLAN_HASH_VALUE 1314604389 and runs in about 600 milliseconds. The bad plan has PLAN_HASH_VALUE 3334601 and runs in 100 or so seconds. I took a look at the plans before doing the SQL Profile but did not really dig into why the plans changed. It was 4:30 pm or so and I was trying to get out the door since I was not on call and wanted to get home at a normal time and leave the problems to the on-call DBA. Here is the good plan:

Here is the bad plan:

Notice that in the bad plan the Rows column has 1 in it on many of the lines, but in the good plan it has larger numbers. Something about the statistics and the values in the where clause caused the optimizer to build the bad plan as if no rows would be accessed from these tables even though many rows would be accessed. So, it made a plan based on wrong information. But I had no time to dig further. I did ask my coworker if anything had changed about this job and nothing had.

So, I created a SQL Profile script by going to the utl subdirectory under sqlt where it was installed on the database server. I generated the script by running coe_xfr_sql_profile gwv75p0fyf9ys 1314604389. I edited the created script by the name coe_xfr_sql_profile_gwv75p0fyf9ys_1314604389.sql and changed the setting force_match=>FALSE to force_match=>TRUE and ran the script. The long running job finished shortly thereafter, and no new incidents have occurred in future runs.

The only thing that confuses me is that when I run fmsstat2.sql now with ss.FORCE_MATCHING_SIGNATURE = 5442820596869317879 I do not see any runs with the good plan. Maybe future runs of the job have a different FORCE_MATCHING_SIGNATURE and the SQL Profile only helped the one job. If that is true, the future runs may have had the correct statistics and run the good plan on their own. It could be that the INSERT statements are now running so fast that they are not recorded in the AWR as one of the top SQL statements.

I wanted to post this to give an example of using force_match=>TRUE with coe_xfr_sql_profile. I had an earlier post about this subject, but I thought another example could not hurt. I also wanted to show how I use fmsstat2.sql to find multiple SQL statements by their FORCE_MATCHING_SIGNATURE value. I realize that SQL Profiles are a kind of band aid rather than a solution to the real problem. But I got out of the door by 5 pm on Monday and did not get woken up in the middle of the night so sometimes a quick fix is what you need.

Bobby

Posted in Uncategorized | 6 Comments

Check your hints carefully

Back in 2017 I wrote about how I had to disable the result cache after upgrading a database to 11.2.0.4. This week I found one of our top queries and it looked like removing the result cache hints made it run 10 times faster. But this did not make sense because I disabled the result cache. Then I examined the hints closer. They looked like this:

/*+ RESULT CACHE */

There should be an underscore between the two words. I look up hints in the manuals and found that CACHE is a real hint. So, I tried the query with these three additional combinations:

 
/*+ RESULT */
 
/*+ CACHE */
 
/*+ RESULT_CACHE */

It ran slow with the original hint and with just the CACHE hint but none of the others. So, the moral of the story is to check your hints carefully because they may not be what you think they are.

Bobby

Posted in Uncategorized | Leave a comment

Huge Pages Speeds Startup of Web Servers

We had issues deploying changes to a large web farm a few weeks back. We had automated the startup of the 25 web servers using a DevOps tool but when it tried to bring them all up at once it took forever and the web servers got timeout errors. We resolved this by putting in huge pages on the metadata database.

I mentioned the unexpectedly large impact of huge pages on login time in a previous post. But, we had not realized that a small metadata database with a 5 gigabyte SGA would need huge pages. I guess I should mention that this is 11.2.0.3 Oracle on Linux. The web farm has 25 servers with 25 database connections each. I think the number of connections from each server should be 5 or less but I have not convinced others to reduce the number. If you have a metadata database with 4 cores and 625 connections from 25 web servers, the web servers cannot use all of those connections. A 4-core database server can only process 10 or 20 SQL statements at once if they are I/O centric. If they are all CPU you are looking at 4 concurrent active SQL statements. If you kick off 625 SQL statements at once the load will go through the roof and everything will timeout. I thought about putting in shared servers on the metadata database to force the 625 sessions to funnel through 12 shared servers so the database server would not be overwhelmed should all 625 connections choose to run at once. But the only problem we had was with logins. All 625 were logging in at the same time when the web farm was started by our DevOps tool. Huge pages resolved this issue by making the login time 10-20 times faster.

The database was in a 16 gigabyte Linux VM with 4 cores and a 5 gigabyte SGA. Without huge pages each login took about 2.5 seconds. With it the logins took about .15 seconds. Without huge pages the load on the database server went over 100 when all 625 sessions started at once. With huge pages the load never exceeded 1. Without huge pages the web farm never came up cleanly and with it the farm came up quickly without error. So, if you have a Linux VM with several gigabytes in your SGA you might want to use huge pages.

Bobby

P.S. This may be an 11.2 bug. Ever since I noticed this behavior the dramatic time difference between with and without huge pages seemed too large to be caused by having to spin up a larger page table with 4K pages. 2.5 seconds of CPU is an eternity just to set up a page table for 5 gigabytes worth of virtual memory. I timed login on an 12.1.0.2 and 18c database with 5 gig SGA and it was around .15 seconds without huge pages. I verified that the problem exists on 11.2.0.4 as well as 11.2.0.3. I think our 11.2 systems are still on RedHat Linux 6 so it may be a combination of database and OS version.

P.P.S This is why blogging about Oracle is such a pain sometimes. It is hard to get things right without extensive testing. But then again, the stuff I said was all true. Switching to huge pages did resolve our issues. I guess as always, your mileage may vary.

P.P.P.S. See the comments below about the pre_page_sga=TRUE. The people who commented are correct. As far back as Oracle 9.2 databases with
pre_page_sga=TRUE and large SGAs can result in logins taking a couple of seconds. See this Oracle bug:

Bug 5072402 : SLOW CONNECT TIME WITH SQLPLUS TO INSTNACE WITH 22GIG SGA AND PRE_PAGE_SGA

According the Oracle, Bug 5072402 is not a bug, but intended behavior. Oracle changed this behavior in Oracle 12 according to this document:

PRE_PAGE_SGA Behaviour Change in Oracle Release 12c (Doc ID 1987975.1)

My own testing proves out the idea that pre_page_sga=TRUE causes slowness in logins 11.2 and does not cause it in 12.1.

My laptop 11.2.0.3

Slow with pre_page_sga=TRUE

real 0m1.039s

pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G

Fast with pre_page_sga=FALSE

real 0m0.218s

pre_page_sga boolean FALSE
sga_max_size big integer 3G
sga_target big integer 3G

My laptop 12.1.0.2

real 0m0.084s

pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G

No HugePages

PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES
4K Configured 4 786436
2048K 0 1537 0

PPPPS:

I have a production HP-UX Itanium server with a 46 gig SGA and pre_page_sga=TRUE. It takes about 1 second to login. I think it is the same effect, but on Linux the 4K page size may exaggerate the effect of pre_page_sga=TRUE whereas HP-UX may handle large SGA’s better. Anyway, I learned something new. Thank you for those who participated in the comments.

I spun up a Delphix Oracle 11.2.0.3 HP-UX Itanium VDB with 32 gig SGA and pre_page_sga=TRUE and the login time was instantaneous. Odd. I guess your mileage may vary still applies.

Posted in Uncategorized | 11 Comments

High Processes Cause Semaphore Errors

I could not create a new Delphix virtual database because I kept getting semaphore limit errors. I ended up reducing the processes parameter value to resolve the issue.

I got these errors creating the new VDB:

ERROR : SQL> ORA-27154: post/wait create failed 
ERROR : ORA-27300: OS system dependent operation:semget failed with status: 28
ERROR : ORA-27301: OS failure message: No space left on device
ERROR : ORA-27302: failure occurred at: sskgpcreates

The processes variable was set to 3000 on production so Delphix tried to create the VDB with the same value. I reduced processes to 100 and the VDB came up cleanly.

I have been living with this problem for a year or two but yesterday I really needed to resolve it and I finally figured it out. I got the idea of reducing processes from this Oracle support document:

Instance Startup Fails With Error ORA-27154,ORA-27300,ORA-27301,ORA-27302 (Doc ID 314179.1)

I looked these errors up on Oracle’s support site several times in the past and never saw the idea of reducing processes so I thought I would mention it here if only to remind myself.

Bobby

Posted in Uncategorized | Leave a comment

Python Practice Produced Pretty Pictures

I wrote a Python program that made some fun pictures so I thought I would share them even though this is not really a database post.

I practice Python programming by doing Rosetta Code programming tasks that no one has implemented in Python. This is a fun way of keeping up my Python skills. My most recent contribution made pretty pictures so I thought I would show them here. The code takes a cube and breaks up the faces into smaller and smaller pieces that change the cube into a rounder shape.

Here is the input:

Input to the program, a cube

Here is the output after one subdivision:

After one subdivision. Chunky.

Here is the output after two subdivisions:

Two subdivisions. Pretty round.

Note that it is getting rounder. Lastly, after four subdivisions it is remarkably round considering that it started as a cube:

Four subdivisions. Quite round.

The main point of this post was to show the pretty pictures. But, to be more serious, if someone is reading this blog and looking for a programming task to do for practice you can do what I do and find a Rosetta Code task for the language you are learning and you can get some good practice.

Bobby

Posted in Uncategorized | Leave a comment

Speed of Light

Looking at cloud databases has me thinking about the speed of light. Wikipedia says that the speed of light is about 186,000 miles per second. If my calculations are correct that is 5.37 microseconds per mile. The United States is about 2680 miles wide so it would take light about 14.4 milliseconds to cross the US. If I ping one of my favorite web sites it takes tens of milliseconds to ping so that kind of makes sense because those sites are in other cities and I am going through various routers. I did some tests with my company’s storage and found that reading from our storage when the data is cached in the storage server takes around 200 microseconds. That is 200 microseconds for a round trip. I’m sure that our database servers and storage are a lot less than a mile apart so most of that time has nothing to do with the speed of light. I heard about a cloud vendor whose fast network connection took 100 microseconds plus the speed of light. I guess 100 microseconds is the cost of getting your data to fiber and light does the rest. If your cloud database was on the other side of the country, I guess it could take 14 milliseconds each way at least for each SQL request. If the cloud database was in your own city and say 10 miles away that would only tack on about 53.7 microseconds each way to the 100 microseconds overhead. I guess it makes sense. Maybe 100 microseconds plus the speed of light is the cost of moving data in the best case?

Bobby

Posted in Uncategorized | 4 Comments

New utility Python scripts for DBAs

I pushed out three new Python scripts that might be helpful to Oracle DBAs. They are in my miscpython repository.

Might be helpful to some people.

Bobby

Posted in Uncategorized | 2 Comments

Slides from March AZORA meeting

Here are the slides from our March Arizona Oracle User Group (AZORA) meeting:

Daniel Morgan Security Master Class

We really appreciate Daniel Morgan taking the time to share this information about the increasingly important topic of database security.

Also, AZORA is always looking for people to present at future meetings. We have one more meeting in May before the blazing hot Arizona summer and then we start up again in September. Email be at bobby@bobbydurrettdba.com if you would like to speak at a future meeting.

Bobby

Posted in Uncategorized | Leave a comment

Oracle Database Security Master Class AZORA Meetup Thursday

Next Thursday Daniel Morgan from TidalScale will be giving a free Oracle Database Security Master Class for our March Arizona Oracle User Group meeting. See the details in our Meetup link:

https://www.meetup.com/Arizona-Oracle-User-Group/events/258858868/

Please RSVP so that we know how many people will be there.

Given the ever-increasing number of security threats to our databases this session on Oracle database security should be valuable to us all.

I hope to see you there!

Bobby

P.S. We are trying out a WebEx this time for remote attendees:

Join Webex meeting

Meeting number (access code): 743 918 204

Join by phone 
+1-855-797-9485 US Toll free 
+1-415-655-0002 US Toll 
Global call-in numbers  |  Toll-free calling restrictions

Posted in Uncategorized | Leave a comment