Default degree may cause performance problems

I encourage people that I work with to put a small number like 8 as the parallel degree when they want to create tables or indexes to use parallel query.  For example:

SQL> create table test parallel 8 as select * from dba_tables;

Table created.

SQL> 
SQL> select degree from user_tables where table_name='TEST';

DEGREE
----------
         8

But frequently I find tables that were created with the default degree by leaving out a number on the parallel clause:

SQL> create table test parallel as select * from dba_tables;

Table created.

SQL> 
SQL> select degree from user_tables where table_name='TEST';

DEGREE
----------
   DEFAULT

The problem is that on a large RAC system with a lot of CPUs per node the default degree can be a large number.  A table with a large degree can cause a single query to eat up all of the available parallel query processes.  That’s fine if only one query needs to run at a time but if you plan to run multiple queries in parallel you need to divide up the parallel query processes among them.  I.e. if you have 100 parallel query processes and need to run 10 queries at a time then you need to be sure each query only gets 10 of them.  I guess degree=5 is 10 processes but the point is that you don’t want to start running a bunch of queries with a degree of 50 each when you have 100 parallel processes to divide up.

With the default settings default degree is 2 X number of cpus X number of RAC nodes.  I tested this on an Exadata V2 with 2 nodes and 16 cpus per node.  The result was as expected, degree=64:

Final cost for query block SEL$1 (#0) - All Rows Plan:
  Best join order: 1
  Cost: 2.1370  Degree: 64  Card: 4203.0000  Bytes: 54639
  Resc: 123.0910  Resc_io: 123.0000  Resc_cpu: 2311650
  Resp: 2.1370  Resp_io: 2.1354  Resc_cpu: 40133

Just to verify that a query with parallel 8 would really use degree 8 I ran the same test with the same table but parallel 8:

Final cost for query block SEL$1 (#0) - All Rows Plan:
  Best join order: 1
  Cost: 6.5419  Degree: 8  Card: 4715.0000  Bytes: 61295
  Resc: 47.1020  Resc_io: 47.0000  Resc_cpu: 2593250
  Resp: 6.5419  Resp_io: 6.5278  Resc_cpu: 360174

Also note the lower cost(2) in the plan with default degree:

--------------------------------------------------+-----------------------------------+-------------------------+
| Id  | Operation                       | Name    | Rows  | Bytes | Cost  | Time      |  TQ  |IN-OUT|PQ Distrib |
--------------------------------------------------+-----------------------------------+-------------------------+
| 0   | SELECT STATEMENT                |         |       |       |     2 |           |      |      |           |
| 1   |  SORT AGGREGATE                 |         |     1 |    13 |       |           |      |      |           |
| 2   |   PX COORDINATOR                |         |       |       |       |           |      |      |           |
| 3   |    PX SEND QC (RANDOM)          | :TQ10000|     1 |    13 |       |           |:Q1000| P->S |QC (RANDOM)|
| 4   |     SORT AGGREGATE              |         |     1 |    13 |       |           |:Q1000| PCWP |           |
| 5   |      PX BLOCK ITERATOR          |         |  4203 |   53K |     2 |  00:00:01 |:Q1000| PCWC |           |
| 6   |       TABLE ACCESS STORAGE FULL | TEST    |  4203 |   53K |     2 |  00:00:01 |:Q1000| PCWP |           |
--------------------------------------------------+-----------------------------------+-------------------------+

Compared to degree 8 (cost=7):

--------------------------------------------------+-----------------------------------+-------------------------+
| Id  | Operation                       | Name    | Rows  | Bytes | Cost  | Time      |  TQ  |IN-OUT|PQ Distrib |
--------------------------------------------------+-----------------------------------+-------------------------+
| 0   | SELECT STATEMENT                |         |       |       |     7 |           |      |      |           |
| 1   |  SORT AGGREGATE                 |         |     1 |    13 |       |           |      |      |           |
| 2   |   PX COORDINATOR                |         |       |       |       |           |      |      |           |
| 3   |    PX SEND QC (RANDOM)          | :TQ10000|     1 |    13 |       |           |:Q1000| P->S |QC (RANDOM)|
| 4   |     SORT AGGREGATE              |         |     1 |    13 |       |           |:Q1000| PCWP |           |
| 5   |      PX BLOCK ITERATOR          |         |  4715 |   60K |     7 |  00:00:01 |:Q1000| PCWC |           |
| 6   |       TABLE ACCESS STORAGE FULL | TEST    |  4715 |   60K |     7 |  00:00:01 |:Q1000| PCWP |           |
--------------------------------------------------+-----------------------------------+-------------------------+

So, this shows that in this case with 2 node RAC and 16 cpus per node that the optimizer uses degree 64 for default degree and is more likely to choose a full scan over an index scan because the cost of the degree 64 full scan is less than that of a degree 8 full scan.

The key point is to understand that putting the keyword PARALLEL by itself on a table or index creation statement instead of PARALLEL 8 (or 4 or 16) can result in unexpectedly high degree.  This high degree can cause performance to degrade by allowing individual queries to eat up the parallel query processes leaving other queries to run inefficiently without the expected parallelism.  The high degree also reduces the cost of a full scan potentially causing them to be favored over index scans where the index scan would be more efficient.

– Bobby

p.s. Here is a zip of the scripts and logs that I used to create the 10053 traces: zip

 

 

 

 

 

 

 

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I've been working as an Oracle database administrator and PeopleSoft administrator since 1994. I'm very interested in Oracle performance tuning.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply