Parsing blocks stats blocks parsing

I had a five-minute conversation with Oracle development Friday that rocked my world.  I found out that parsing blocks stats which blocks parsing.

We have a system with queries that are taking minutes to parse.  These queries include the main tables on our database, one of which is interval partitioned and has 20,000 sub-partitions.  We have seen a situation where Oracle’s delivered statistics gathering job hangs on a library cache lock waiting for a query to finish parsing.  But, much worse than that, we find most queries on our system hanging on library cache lock waits blocked by the statistics job.

We have an SR open on this situation because it seems to be a bug, but Friday someone on the phone from Oracle development explained that this parse blocks stats blocks parse situation is normal.  Later after I got off the phone I built a simple test case proving that what he said was true.  I took a query that took a long time to parse in production and ran it on our development database and it took 16 seconds to parse there.  I choose the smallest table that the query included and gathered stats on it.  The stats ran in a fraction of a second when run by itself, but if I started the long parsing query in one window and ran the stats in another window the stats hung on a library cache lock wait for 15 seconds.  Then I created a trivial query against the same small table I had gathered stats on.  The query ran instantly by itself.  But, if I ran the long parsing query first, kicked of the stats which hung on the lock, and then kicked off the short query against the table I was gathering stats on the short query hung on a library cache lock also.  This example convinced me that the parse blocks stats blocks parse chain was real.

This morning I built a standalone test case that others can run to prove this out on their databases: zip of testcase.  To run the testcase you need three windows where you can run three sqlplus scripts in rapid succession.  In one window first just run tables.sql to create the test tables.  Then run these three scripts one after the other in each window to create the three link chain: chain1.sql, chain2.sql, chain3.sql.  Chain1.sql has the explain plan of a query that takes a long time to parse.  Chain2.sql gathers stats on one table.  Chain3.sql runs a simple query against the table whose stats are being gathered.  Chain1 spends all of its time on the CPU doing the parse.  Chain2 and 3 spends all of their time on library cache lock waits.

First I created two tables:

create table t1 as select * from dba_tables;
create table t2 as select * from dba_tables;

Next I kicked off the explain plan that takes a long time to run.  It joined 100 tables together:

explain plan into plan_table for 
select 
count(*)
from
     t1,
     t2,
     t2 t3,
...
     t2 t100
where
  t1.owner=t2.owner and
...
  t1.owner=t100.owner
/

This explain plan ran for 26 seconds, almost all of which was CPU:

Elapsed: 00:00:26.90

...

CPU in seconds
--------------
         26.81

Right after I kicked off the explain plan I kicked off this statement which gathered stats on the first table in the from clause:

execute dbms_stats.gather_table_stats(NULL,'T1');

This ran for 25 seconds and almost all of the time was spent on a library cache lock wait:

Elapsed: 00:00:25.77

...

Library cache lock in seconds
-----------------------------
                        25.55

Right after I kicked off the gather table stats command I ran this simple query making sure that it was unique and required a hard parse:

select /* comment to force hard parse */ count(*) from T1;

This ran for 24 seconds and almost all of the time was spent on a library cache lock wait:

Elapsed: 00:00:24.48

...

Library cache lock in seconds
-----------------------------
                        24.48

Evidently when a session parses a query it needs to obtain a shared lock on every table that the query includes.  When you gather statistics on a table you need to obtain an exclusive lock on the table, even if you are gathering statistics on one partition or sub-partition of the table.  While the statistics gathering session waits to acquire an exclusive lock any new parses that include the same table will hang.

Prior to Friday I did not think that there was any non-bug situation where gathering optimizer statistics would lock up sessions.  I thought that the only negative to gathering statistics at the same time as other application processing was that statistics gathering would compete for system resources such as CPU and I/O and possibly slow down application code.  But, now I know that gathering statistics can hang all queries that use the given table if stats gathering gets hung up waiting for a query that takes a long time to parse.

– Bobby

P.S. After reviewing the SR I wanted to understand what this parse blocks stats blocks parse looked like in a state dump.  The Oracle support analyst explained how the locks looked in a state dump that we uploaded but I didn’t get a chance to look at it closely until today.  I found the most important information in lines with the string “LibraryObjectLock” at the front of the line after some spaces or tabs.  There were three types of lines – the holding share lock, the waiting exclusive lock, and the many waiting share locks:

LibraryObjectLock:  Address=... Handle=0x5196c8908 Mode=S ...
LibraryObjectLock:  Address=... Handle=0x5196c8908 RequestMode=X ...
LibraryObjectLock:  Address=... Handle=0x5196c8908 RequestMode=S ...

The “…” indicates places I edited out other details.  The handle 0x5196c8908 identifies the table being locked.  The “Mode=S” string indicates a successful share lock of that table.  The “RequestMode=X” was from the stats job trying to get exclusive access to the table.  The “RequestMode=S” was all the other sessions trying to get shared access to the table waiting for stats to first get exclusive access.  Anyway, I just wanted to translate what Oracle support told me into something that might be useful to others.  Plus I want to remember it myself!


Posted in Uncategorized | 2 Comments

Alerting on plans that change for the worse

I’ve uploaded a monitoring script that I have worked on: zip.

The script alerts you when the optimizer runs a SQL statement with a potentially new and inefficient plan so you can intervene.  This script improves upon my earlier script which only alerts you to SQL statements running with new plans.  The new script compares the average elapsed time of the current plan with the average of the most often executed plan.  If the new plan averages more than ten times the most often executed plan then the script alerts you to a possible new slow plan.  The elapsed time for the current plan comes from V$SQL and the time for the most often executed plan comes from DBA_HIST_SQLSTAT.

Here is an example output from the first test case:

SQL_ID        PLAN_HASH_VALUE
------------- ---------------
a1fw5xjcwkjqx      1357081020

There are two test cases included in the script.  The first test case has a query which uses an index and the plan changes when I drop the index.

If you look at the output for the first test case you see the good plan:

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 268773832

---------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |    58 (100)|
|*  1 |  INDEX RANGE SCAN| TESTI | 16222 |   221K|    58   (0)|
---------------------------------------------------------------

The good plan uses the index.  Here is the bad plan.  Note how the sql_id and plan_hash_value correspond to the output of the monitor script.

SQL_ID  a1fw5xjcwkjqx, child number 0
-------------------------------------
select /* 9 */ owner,table_name from test where owner='XYZ' and
table_name='XYZ'

Plan hash value: 1357081020

---------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       | 15708 (100)|
|*  1 |  TABLE ACCESS FULL| TEST |  8111 |   110K| 15708   (1)|
---------------------------------------------------------------

I have this running in a cron job on a development server and I plan to put the script into our production server next week and set it up to alert me with an email when the optimizer runs SQL statements with potentially  new and inefficient plans.

– Bobby

 


Posted in Uncategorized | Leave a comment

Check out the new Oracle help web site

I found a broken link to an Oracle help document in one of my posts and when I went to the Oracle 12c database documentation to find the new URL to put in my post I found that Oracle had totally revamped their online manuals.

Here is a link to the new high level Oracle help site: url

I’ve only looked at it for a few minutes and can’t say whether I like it or not.  You can still download the manuals in PDF format so that remains familiar.  It looks like the new site integrates documentation across most or all of Oracle’s products in a similar format and that’s pretty cool.

Anyway, I just saw this and thought I would pass it along.

– Bobby


Posted in Uncategorized | 3 Comments

Oracle DBA job in Tempe, Arizona

We still have a position open on our Oracle database team here in Tempe, Arizona.  Here is the link with an updated job description: url

We have a great team and would love to have a new member to join us.

-Bobby

Posted in Uncategorized | 2 Comments

Useful Carlos Sierra post about queries changing performance

Saw an interesting post related to work I am doing locking in plans and finding plans that have changed: Carlos Sierra post on queries with changing performance.

Our DBA team uses versions of my sqlstat.sql script to find queries whose elapsed time change and we use a script from Carlos Sierra’s SQLT collection to force the execution plan back to an efficient plan that the optimizer choose in the past.  Carlos Sierra’s new post includes scripts that resemble my sqlstat script but with more functionality including an interesting way of helping people figure out which queries are now running slower.  At this point we just look at the top queries that are running and run sqlstat.sql to see if they have run with less elapsed time in the past.  We eyeball it and usually it is obvious because the bad plan takes 100 times longer than the good one.  But the newly posted Carlos Sierra script seems to use some math to help figure out whether the elapsed time is really bad enough for us to consider it a problem.

I want to set up a monitoring script that pages us when a plan changes for the worse so we can decide whether to intervene and force on older plan to improve performance.  I have a simple script running on one of our databases but, as I expected, it is returning a lot of false positives.  So, I’m hoping to have some time this week to improve that script so that we get fewer false alarms and I’m going to take a close look at Carlos Sierra’s newly posted script for ideas.

– Bobby

 




Posted in Uncategorized | 4 Comments

Script to count and recompile invalid objects

This is pretty simple, but I thought I would share it since it is helpful to me.  I have been preparing for a large migration which involves table, index, type, function, package, and procedure changes.  When I run a big migration like this I check for invalid objects before and after the migration and attempt to recompile any that are invalid.  By checking before and after the migration I know which objects the migration invalidated.

Here’s the script:

select status,count(*)
from dba_objects
where owner='YOURSCHEMA'
group by status
order by status;

select 'alter '||object_type||' '||owner||'.'||object_name||
       ' compile;'
from dba_objects
where owner='YOURSCHEMA' and
status='INVALID' and
object_type <> 'PACKAGE BODY'
union
select 'alter package '||owner||'.'||object_name||' compile body;'
from dba_objects
where owner='YOURSCHEMA' and
status='INVALID' and
object_type = 'PACKAGE BODY';

Replace “YOURSCHEMA” with the schema that your objects are in.

Output is something like this:

STATUS    COUNT(*)
------- ----------
INVALID          7
VALID        53581

'ALTER'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||'COMPILE;'
--------------------------------------------------------------
alter FUNCTION YOURSCHEMA.YOURFUNCTION compile;
alter package YOURSCHEMA.YOURPACKAGE compile body;

The counts give me a general idea of how many objects are invalid and the alters gives me sql that I can paste into a script and run to attempt to compile the objects and make them valid.

Hope this is helpful to someone else.  It’s helpful to me.

– Bobby



Posted in Uncategorized | Leave a comment

Recap of yesterday’s Arizona Oracle User Group (AZORA) meeting

Yesterday was the first meeting of the newly restarted Arizona Oracle User Group, AZORA.  It was a good first meeting to kick off what I hope will turn into an active club.

We met in the Tempe Public Library in a large downstairs room with a projector screen and plenty of seating with tables so it was easy to take notes.  We also had snacks.  I got a Paradise Bakery chocolate chip cookie and bottled water so that put me in a good mood for the afternoon meeting. They also had some giveaways and so I picked up an Oracle pen and notebook to use to take notes during the talk.  They also gave away three or four Rich Niemiec books and some gift cards as prizes, but, as usual, I didn’t win anything.

The first talk was about “Big Data”.  I’m skeptical about any buzzword, including big data, but I enjoyed hearing from a technical person who was actually doing this type of work.  I would rather hear an honest perspective from someone in the midst of the battle than get a rosy picture from someone who is trying to sell me something.  Interestingly, the talk was about open source big data software and not about any Oracle product, so it gave me as an Oracle database specialist some insight into something completely outside my experience.

The second talk was about another buzzword – “The Cloud”.  The second talk was as helpful as the first because the speaker had exposure to people from a variety of companies that were actually doing cloud work.  This talk was more directly related to my Oracle database work because you can have Oracle databases and applications based on Oracle databases deployed in the cloud.

It was a good first meeting and I hope to attend and help support future meetings.  Hopefully we can spread the news about the club and get even more people involved and attending so it will be even more useful to all of us.  I appreciate those who put in the effort to kick off this first meeting.

– Bobby





Posted in Uncategorized | 2 Comments

Reminder: first Arizona Oracle User Group meeting tomorrow

Fellow Phoenicians (citizens of the Phoenix, Arizona area):

This is a reminder that tomorrow is the first meeting of the newly reborn (risen from the ashes) Arizona Oracle User Group.  Here are the meeting details: url

I hope to meet some of my fellow Phoenix area DBAs tomorrow afternoon.

– Bobby

 


Posted in Uncategorized | Leave a comment

Script to compare plan performance

Here’s a zip of a script I modified today: zip

Here’s an example output:

 QUERY_NUM SQL_ID        PLAN_HASH_VALUE EXECUTIONS AVG_ELAPSED OPTIMIZER_COST AVG_FETCHES  AVG_SORTS AVG_DISK_READS AVG_BUFFER_GETS   AVG_ROWS    AVG_CPU AVG_IOWAIT AVG_DIRECT_WRITES AVG_PHYS_READS AVG_PHYS_WRITES
---------- ------------- --------------- ---------- ----------- -------------- ----------- ---------- -------------- --------------- ---------- ---------- ---------- ----------------- -------------- ---------------
         1 gxk0cj3qxug85      2051250004   39504258  31630.2852             15  4.71993394 .444248288     4.07294381      440.124393 41.3960784 3447.83137 28056.5262                 0     .006406626               0
         1 gxk0cj3qxug85       548353012   24360619  31854.5456             15  4.73696596 .574941507     4.16225047      443.290799 41.5668639 3501.62695 28205.9349                 0     .009019804               0
         2 53b6j9nvd2vwa       376144290    1069593  438746.758     19.6425025  33.7741683          0     58.9193684       3864.5447  332.18592 32952.0388 406548.271                 0     19.2981312               0
         2 53b6j9nvd2vwa       655563694    1008553  414586.506     15.0111675  33.7122908          0     58.6486828      3851.28671 331.575216 32283.0233 382834.453                 0     12.4507269               0
         2 53b6j9nvd2vwa      2504177057     274652  418157.478     19.5541562  32.8918486          0     61.1868838      3726.22908 323.358235 31005.0901 388545.269                 0     23.2050923               0
         3 4usguw7d6kxy4      2543632952    1070303   151648.49     12911.7175  2.30832577 .999997197     5.49557555      6221.80141 16.7674061 130072.596 10153.4778                 0     .388805787               0
         3 4usguw7d6kxy4      3221641997     996033  151860.479      11987.696  2.22684489 .999998996     7.10842914      6073.16306  15.902446  127194.45 13655.9405                 0     .316254582               0
         3 4usguw7d6kxy4      1764817920          2    277287.5          12860           1          1              1            6956          5     260000    10575.5                 0              1               0
         4 buhvbrfw1uyrd      2225737849    2871021  37985.4363     7.37064414  32.5216263          0      5.0695129      439.697767 319.641241 3463.90361 34645.8396                 0     2.01971389               0
         5 bvttgft3kj2hg      3569562598     293543  252814.965     20.6213018  95.7064212 .999996593     12.3908524      11035.1541 951.571562 137023.945 95634.8559                 0     1.94147365               0
         6 084v1t4whvzwd       883418687      70258  940211.781     4875.03099  1.00001423          0     30.6947394      22558.3683 .954980216 880713.798 16997.2457                 0     2.93185118               0
         7 972sx2uhfswn5       632842214  229406586  279.732773             14  1.00002261 .092121309     .003579313      25.6327361 .941600155 229.417607  23.334532                 0     .000786085               0

I’m using this script to help figure out which plans to lock in with SQL profiles.  For example, sql_id gxk0cj3qxug85 executes plan 2051250004 more often than plan 548353012 and they seem similar in performance so maybe it would be safe to just lock in the first plan.

I’ve also been reviewing the plans manually to see if there were major differences or if the plans were essentially the same.

I had a few minutes so I thought I would pass this along.

– Bobby

Posted in Uncategorized | Leave a comment

Tweaked bind variable script

I modified the bind variable extraction script that I normally use to make it more helpful to me.

Here was my earlier post with the old script: blog post

Here is my updated script:

set termout on 
set echo on
set linesize 32000
set pagesize 1000
set trimspool on

column NAME format a3
column VALUE_STRING format a17

spool bind2.log

select * from 
(select distinct
to_char(sb.LAST_CAPTURED,'YYYY-MM-DD HH24:MI:SS') 
  DATE_TIME,
sb.NAME,
sb.VALUE_STRING 
from 
DBA_HIST_SQLBIND sb
where 
sb.sql_id='gxk0cj3qxug85' and
sb.WAS_CAPTURED='YES')
order by 
DATE_TIME,
NAME;

spool off

Replace gxk0cj3qxug85 with the sql_id of your own query.

The output looks like this (I’ve scrambled the values to obscure production data):

DATE_TIME           NAM VALUE_STRING
------------------- --- -----------------
2014-08-29 11:22:13 :B1 ABC
2014-08-29 11:22:13 :B2 DDDDDD
2014-08-29 11:22:13 :B3 2323
2014-08-29 11:22:13 :B4 555
2014-08-29 11:22:13 :B5 13412341
2014-08-29 11:22:13 :B6 44444
2014-08-29 11:26:47 :B1 gtgadsf
2014-08-29 11:26:47 :B2 adfaf
2014-08-29 11:26:47 :B3 4444
2014-08-29 11:26:47 :B4 5435665
2014-08-29 11:26:47 :B5 4444
2014-08-29 11:26:47 :B6 787

This is better than the original script because it keeps related bind variable values together.

– Bobby


Posted in Uncategorized | 2 Comments