Dynamic sampling hint better than multi-column histogram

One of our senior developers found a way to greatly improve the performance of a query by adding a dynamic sampling hint for a table that had multiple correlated conditions in its where clause.  This led me to try to understand why the dynamic sampling hint helped and it appears that a dynamic sampling hint can deal with correlated predicates in the where clause better than multi-column histograms.

A quick Google search about the dynamic sampling hint found this article by Tom Kyte.  The section titled “When the Optimizer Guesses” seemed to match the symptoms I saw in the query our developer was tuning.  So, I started messing with the test case from the article and found that multi-column histograms would not make the test case run the efficient plan with the index scan but dynamic sampling, even at level 1, would.

I used the following commands to gather stats:

select DBMS_STATS.CREATE_EXTENDED_STATS (NULL,'T','(FLAG1,FLAG2)') from dual;
execute DBMS_STATS.GATHER_TABLE_STATS (NULL,'T',estimate_percent=>NULL,method_opt=>'FOR ALL COLUMNS SIZE 254');

Here is the plan without the dynamic sampling hint:

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 37371 |  3941K|   580   (1)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T    | 37371 |  3941K|   580   (1)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG1"='N' AND "FLAG2"='N')

Here is the plan with this hint: dynamic_sampling(t 1)

Execution Plan
----------------------------------------------------------
Plan hash value: 1020776977

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    54 |  5832 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |    54 |  5832 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |    54 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FLAG1"='N' AND "FLAG2"='N')

Note
-----
   - dynamic sampling used for this statement (level=2)

If you look at a 10053 trace you see that dynamic sampling is running this query:

SELECT
...bunch of hints edited out...
NVL (SUM (C1), 0), NVL (SUM (C2), 0), NVL (SUM (C3), 0)
  FROM (SELECT /*+ ...more hints... */
              1 AS C1,
               CASE
                  WHEN "T"."FLAG1" = 'N' AND "T"."FLAG2" = 'N' 
                  THEN 1
                  ELSE 0
               END
                  AS C2,
               CASE
                  WHEN "T"."FLAG2" = 'N' AND "T"."FLAG1" = 'N' 
                  THEN 1
                  ELSE 0
               END
                  AS C3
          FROM "MYUSERID"."T" SAMPLE BLOCK (0.519350, 1) 
               SEED (1) "T") SAMPLESUB

I think this is the same query that is in the article but in the 10053 trace it is easier to see the constants “T”.”FLAG2″ = ‘N’ AND “T”.”FLAG1″ = ‘N’.  So, it shows that dynamic sampling is running a query with the where clause conditions.  In the 10053 trace for our production query the dynamic sampling query came back with all of the conditions on the table that we hinted.

Note that if you leave stats off of the table you get the same good plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 1020776977

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   162 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |   162 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FLAG1"='N' AND "FLAG2"='N')

Note
-----
   - dynamic sampling used for this statement (level=2)

So, with no stats on the table it must be doing the same dynamic sampling query with all the where clause conditions.

Interestingly, in the case of the real query the table is very small.  I’m thinking we need to just delete the stats from the table and lock them empty.  That way every query with the table will dynamically sample the small table and handle correlated predicates better than multi-column stats.

Here is my test script and its log.

– Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.