SQLT Emergency Plan Change

I’m on call this week so I didn’t plan to do any blog posting but I had an interesting performance issue this morning that I thought would be helpful to document.  This is somewhat redundant to earlier posts but maybe another real example would be helpful to others (and myself as I try to explain it).

So, I got paged at 5:16 AM today because the session count had gone up on a key database.  I connected using Toad and saw many active sessions.  A number were hung on row locks but others were running a normal query.  At first I thought the locks were the issue until I looked at the AWR report.

I executed DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT, waited a couple of minutes, and then ran it again.  This gave me a short time period while the problem was occurring.  These were the top events:

Top 5 Timed Foreground Events


Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: TX – row lock contention 11 517 47038 24.97 Application
db file scattered read 74,786 513 7 24.75 User I/O
db file sequential read 61,708 448 7 21.62 User I/O
DB CPU 431 20.82
read by other session 47,482 416 9 20.08 User I/O

So, this pointed to the locks, but then I looked at the top sql:

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id
1,447.68 106 13.66 69.86 25.24 73.68 b9ad7h2rvtxwf

This is a pretty normal query but there isn’t anything on our system that runs for more than a fraction of a second normally so a top query running for 13 seconds means there was a problem.

So, then I ran my sqlstat.sql script for the one query b9ad7h2rvtxwf.  Here were the results (edited to fit on the blog):

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms
------------- --------------- ------------------------- ---------------- ------------------ -------------- -------------
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM             5058         147.520418     55.2906287    84.5061609
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM             3553         138.164556     52.4176752    81.5798179
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM             1744         131.999904     50.7224771    78.9727196
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM             1151         129.623057     42.0590791    85.9294535
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM              872         120.398335     36.7775229    83.2977339
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM              604         132.635639     33.8245033    97.5104901
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM              495         133.393253     33.1111111    99.2305293
b9ad7h2rvtxwf       405424828 22-OCT-13 PM              121         34966.0669     5395.86777    28047.5075
b9ad7h2rvtxwf      3569562598 22-OCT-13 PM              310         194.172239     34.7419355    159.561652
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              367         13515.5776     4255.88556    9348.67928
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              198         10156.8853     4131.56566    6100.66929
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              121         13047.2503     4077.27273    9055.64753
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              106         14410.4077     4095.75472    10457.2538
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              132         13294.6225     3334.62121    10180.9116
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              408         13075.3841     3327.35294    9828.64507
b9ad7h2rvtxwf       405424828 23-OCT-13 AM             1321         13314.1561     3119.74262    10091.0076
b9ad7h2rvtxwf       405424828 23-OCT-13 AM             3023         15692.5471     3126.99636     12361.605
b9ad7h2rvtxwf       405424828 23-OCT-13 AM             1924         16933.1762     3159.12682      13704.74
b9ad7h2rvtxwf       405424828 23-OCT-13 AM              106         13657.3272     3446.88679    10062.0426

So, between 10 and 11 pm last night the new plan 405424828 started being used and it resulted in the query that normally runs for 133 milliseconds (.1 seconds) running instead for 10091 ms = 10 seconds.  Not good when the query is run over 1000 times per hour.

So, I copied the sqlt scripts from another database server and got into the utl directory and ran the following commands:

-- build script to load profile

@coe_xfr_sql_profile.sql b9ad7h2rvtxwf 3569562598

-- run generated script


Then the performance issues all went away because b9ad7h2rvtxwf went back to running in .1 seconds.  Here is a previous post related to using sqlt this way.

The thing that strikes me about this scenario is that it really isn’t very hard to use this method to resolve these kind of issues.  All you need to do is use the awr report and sqlstat.sql script to identify the sql_id and the good plan_hash_value and then you can force it to run this way.

– Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to SQLT Emergency Plan Change

  1. umyal2001 says:

    Bobby – Thanks for sharing. I am supporting large PeopleSoft client and was looking for better tool to deal with those pesky Cobol queries from payroll. Sounds like this seems to be the perfect candidate? Have you done any calc tuning with SQLT yet? Thanks again.

    • Bobby says:

      I haven’t used SQLT and its sql profile scripts to tune payroll batch processes but I think you are right that this is a good candidate. The payroll processes should have static sql statements with bind variables. So, if you find that the plan changes for the worse on some SQL that is part of a calc job you could use the same process to force the previous good plan to run the next time you run the job. In the past I’ve used hints to do the same thing but that requires a change to the code. I’ve been tuning PeopleSoft since 1995, FYI.

  2. Pingback: Using hints with coe_xfr_sql_profile.sql | Bobby Durrett's DBA Blog

Leave a Reply