Another SQL Profile to the rescue!

We have had problems with set of databases over the past few weeks. Our team does not support these databases, but my director asked me to help. These are 11.2.0.1 Windows 64 bit Oracle databases running on Windows 2008. The incident reports said that the systems stop working and that the main symptom was that the oracle.exe process uses all the CPU. They were bouncing the database server when they saw this behavior and it took about 30 minutes after the bounce for the CPU to go back down to normal. A Windows server colleague told me that at some point in the past a new version of virus software had apparently caused high CPU from the oracle.exe process.

At first I looked for some known bugs related to high CPU and virus checkers without much success. Then I got the idea of just checking for query performance. After all, a poorly performing query can eat up a lot of CPU. These Windows boxes only have 2 cores so it would not take many concurrently running high CPU queries to max it out. So, I got an AWR report covering the last hour of a recent incident. This was the top SQL:

Top SQL

The top query, sql id 27d8x8p6139y6, stood out as very inefficient and all CPU. It seemed clear to me from this listing that the 2 core box had a heavy load and a lot of waiting for CPU queuing. %IO was zero but %CPU was only 31%. Most likely the rest was CPU queue time.

I also looked at my sqlstat report to see which plans 27d8x8p6139y6 had used over time.

PLAN_HASH_VALUE END_INTERVAL_TIME     EXECUTIONS Elapsed ms
--------------- --------------------- ---------- -----------
     3067874494 07-MAR-16 09.00.50 PM        287  948.102286
     3067874494 07-MAR-16 10.00.03 PM        292  1021.68191
     3067874494 07-MAR-16 11.00.18 PM        244  1214.96161
     3067874494 08-MAR-16 12.00.32 AM        276  1306.16222
     3067874494 08-MAR-16 01.00.45 AM        183  1491.31307
      467860697 08-MAR-16 01.00.45 AM        125      .31948
      467860697 08-MAR-16 02.00.59 AM        285  .234073684
      467860697 08-MAR-16 03.00.12 AM        279  .214354839
      467860697 08-MAR-16 04.00.25 AM        246   .17147561
      467860697 08-MAR-16 05.00.39 AM         18        .192
     2868766721 13-MAR-16 06.00.55 PM         89    159259.9
     3067874494 13-MAR-16 06.00.55 PM          8  854.384125
     2868766721 13-MAR-16 07.00.50 PM         70  1331837.56

Plan 2868766721 seemed terrible but plan 467860697 seemed great.

Our group doesn’t support these databases so I am not going to dig into how the application gathers statistics, what indexes it uses, or how the vendor designed the application. But, it seems possible that forcing the good plan with a SQL Profile could resolve this issue without having any access to the application or understanding of its design.

But, before plunging headlong into the use of a SQL Profile I looked at the plan and the SQL text.  I have edited these to hide any proprietary details:

SELECT T.*
    FROM TAB_MYTABLE1 T,
         TAB_MYTABLELNG A,
         TAB_MYTABLE1 PIR_T,
         TAB_MYTABLELNG PIR_A
   WHERE     A.MYTABLELNG_ID = T.MYTABLELNG_ID
         AND A.ASSIGNED_TO = :B1
         AND A.ACTIVE_FL = 1
         AND T.COMPLETE_FL = 0
         AND T.SHORTED_FL = 0
         AND PIR_T.MYTABLE1_ID = T.PIR_MYTABLE1_ID
         AND ((PIR_A.FLOATING_PIR_FL = 1 
               AND PIR_T.COMPLETE_FL = 1)
              OR PIR_T.QTY_PICKED IS NOT NULL)
         AND PIR_A.MYTABLELNG_ID = PIR_T.MYTABLELNG_ID
         AND PIR_A.ASSIGNED_TO IS NULL
ORDER BY T.MYTABLE1_ID

The key thing I noticed is that there was only one bind variable. The innermost part of the good plan uses an index on the column that the query equates with the bind variable. The rest of the plan is a nice nested loops plan with range and unique index scans. I see plans in this format in OLTP queries where you are looking up small numbers of rows using an index and join to related tables.

-----------------------------------------------------------------
Id | Operation                        | Name                     
-----------------------------------------------------------------
 0 | SELECT STATEMENT                 |                          
 1 |  SORT ORDER BY                   |                          
 2 |   NESTED LOOPS                   |                          
 3 |    NESTED LOOPS                  |                          
 4 |     NESTED LOOPS                 |                          
 5 |      NESTED LOOPS                |                          
 6 |       TABLE ACCESS BY INDEX ROWID| TAB_MYTABLELNG           
 7 |        INDEX RANGE SCAN          | AK_MYTABLELNG_BY_USER    
 8 |       TABLE ACCESS BY INDEX ROWID| TAB_MYTABLE1             
 9 |        INDEX RANGE SCAN          | AK_MYTABLE1_BY_MYTABLELNG
10 |      TABLE ACCESS BY INDEX ROWID | TAB_MYTABLE1             
11 |       INDEX UNIQUE SCAN          | PK_MYTABLE1              
12 |     INDEX UNIQUE SCAN            | PK_MYTABLELNG            
13 |    TABLE ACCESS BY INDEX ROWID   | TAB_MYTABLELNG           
-----------------------------------------------------------------

The bad plan had a gross Cartesian merge join:

Plan hash value: 2868766721

----------------------------------------------------------------
Id | Operation                       | Name                     
----------------------------------------------------------------
 0 | SELECT STATEMENT                |                          
 1 |  NESTED LOOPS                   |                          
 2 |   NESTED LOOPS                  |                          
 3 |    MERGE JOIN CARTESIAN         |                          
 4 |     TABLE ACCESS BY INDEX ROWID | TAB_MYTABLE1             
 5 |      INDEX FULL SCAN            | PK_MYTABLE1              
 6 |     BUFFER SORT                 |                          
 7 |      TABLE ACCESS BY INDEX ROWID| TAB_MYTABLELNG           
 8 |       INDEX RANGE SCAN          | AK_MYTABLELNG_BY_USER    
 9 |    TABLE ACCESS BY INDEX ROWID  | TAB_MYTABLE1             
10 |     INDEX RANGE SCAN            | AK_MYTABLE1_BY_MYTABLELNG
11 |   TABLE ACCESS BY INDEX ROWID   | TAB_MYTABLELNG           
12 |    INDEX RANGE SCAN             | AK_MYTABLELNG_BY_USER    
----------------------------------------------------------------

Reviewing the SQL made me believe that there was a good chance that a SQL Profile forcing the good plan would resolve the issue. Sure, there could be some weird combination of data and bind variable values that make the bad plan the better one. But, given that this was a simple transactional application it seems most likely that the straightforward nested loops with index on the only bind variable plan would be best.

We used the SQL Profile to force these plans on four servers and so far the SQL Profile has resolved the issues. I’m not saying that forcing a plan using a SQL Profile is the only or even best way to resolve query performance issues. But, this was a good example of where a SQL Profile makes sense. If modifying the application, statistics, parameters, and schema is not possible then a SQL Profile can come to your rescue in a heartbeat.

Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I've been working as an Oracle database administrator and PeopleSoft administrator since 1994. I'm very interested in Oracle performance tuning.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply