Cardinality errors across joins

We have been discussing query tuning at work and I’ve come back to an example of the Oracle optimizer choosing the wrong plan due to an error in its calculation of the number of rows returned by a query.  I want to show that it is easy and common to have queries that the optimizer can not correctly optimize because it can’t get the number of rows, or cardinality right.

I’ve recast the fourth script from my cardinality talk – old scripts here.  I’ve made it more real life by changing the table names to be more business related.  I’ve also tried to prove that multi-column histograms don’t help and neither does cardinality feedback.  This is all on 11.2.0.3 on 32-bit Windows.  Here is the updated script and its log.

Two tables are joined together.  First is a small division table that has text names and numeric division numbers:

create table DIVISION (DIVNAME VARCHAR2(2000),DIVNUM NUMBER);

insert into DIVISION values ('Los Angeles',1);
insert into DIVISION values ('Mayberry',2);

Next is the sales detail table.  Note that there was only one sale in Mayberry but a million in Los Angeles:

create table SALES (DIVNUM NUMBER);

/* SALES table has 1,000,000 1's for LA and one 2 for Mayberry */

declare
  i number;
begin
  i:=1;
  loop
    insert into SALES values (1);
    i:=i+1;
    if (mod(i,10000)=0) then
      commit;
    end if;
    exit when (i > 1000000);
  end loop;
  commit;
  insert into SALES values (2);
  commit;
end;
/

create index SALESINDEX on SALES(DIVNUM);

The query is just going to get the one Mayberry row joining the division and sales tables.  There is an index but it doesn’t use it.

SQL> select B.DIVNUM
  2  from DIVISION A,SALES B
  3  where
  4  a.DIVNUM=B.DIVNUM and
  5  A.DIVNAME='Mayberry';

    DIVNUM
----------
         2

Elapsed: 00:00:00.07

--------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |       |       |   455 (100)|
|*  1 |  HASH JOIN         |          |   500K|  8300K|   455   (2)|
|*  2 |   TABLE ACCESS FULL| DIVISION |     1 |    14 |     3   (0)|
|   3 |   TABLE ACCESS FULL| SALES    |  1000K|  2929K|   448   (2)|
--------------------------------------------------------------------

We have queries like this all the time where two tables are joined on some sort of numeric key but the where clause has a meaningful text-based condition on the smaller table.  In the example it is a small division table joined to a sales table.  It could also be a date table that relates quarters or fiscal periods to particular days.

Some sort of cross table histogram would be needed to fix this.  It would look something like this:

DIVISION.DIVNAME DIVISION.DIVNUM SALES.DIVNUM NUM_ROWS
     Los Angeles                1           1  1000000
        Mayberry                2           2        1

But, think about how complex this could get in terms of all the cross table relationships, even if this type of histogram were implemented.  For the optimizer to figure out the row count across joins in the most general case there would be countless different cross table histograms that capture the relationships among the many tables and columns that make up a large database.

The type of join demonstrated in the test script is common in our production database applications, so I believe this test shows that we should expect the optimizer to frequently have inaccurate cardinality estimates which can lead to inefficient query execution plans.

– Bobby

P.S.  I ran across a case yesterday where a dynamic sampling hint helped with a table that had multiple correlated predicates against the one table.  So, I thought I’d try that hint on both tables in the example for this post but it still didn’t get the good plan with the index.  The updated example script and its log are here.  I ran this test on Exadata on 11.2.0.2 because that was where I saw the dynamic sampling hint work.

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Cardinality errors across joins

  1. Pingback: Denormalize tables to improve cardinality estimate | Bobby Durrett's DBA Blog

    • Bobby says:

      Thank you for this link. From a quick read of the blog post it appears that the author was wrestling with the same issues and had some creative approaches to dealing with it. But, I think there is a deeper issue that is unresolvable in the general case. I haven’t proven it in any mathematical sense but it is my intuition that there are limits to what any SQL optimizer can do in limited parse time and that my simple example is a small clue to the fundamental limitations of SQL optimization.

  2. rainer stenzel says:

    More profound analysis for join cardinality estimation difficulties in particular for skewed row sets and the impact of having different kinds of histograms on it:
    http://www.centrexcc.com/Joins,%20Skew%20and%20Histograms.pdf
    http://www.adellera.it/investigations/join_over_histograms/JoinCardinalityEstimationWithHistogramsExplained.pdf

    • Bobby says:

      Thank you for the links. It looks like it will take some time to read these which I don’t have right now, but I’m looking forward to getting back to these hopefully some time soon.

  3. Pingback: Adaptive Optimization Limitation Example | Bobby Durrett's DBA Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.