Pushing predicates into MySQL subqueries

I am just getting started looking at MySQL performance tuning but I thought I would share something that I may have figured out. I’m working with a query that does a count and a group by in a subquery and it runs too long. It seems that the query reads every row from a table instead of passing a predicate into the subselect and accessing only a couple of rows. MySQL 5.7 seems to behave this way normally. I have built a simple test case to show that MySQL 5.7 works differently than Oracle 12.1 in this situation.

I build a table called TEST with about 1,000,000 rows (1,1) and one row (2,2) and put an index on the first column.

create table test(a int,b int);

insert into test values(1,1);

insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;

-- 1048576 rows

select count(*) from test;

create index testi on test (a);

insert into test values (2,2);

analyze table test;

Then I have a subselect with a SUM and a GROUP by and a where clause condition outside of the subselect that would cause the subselect to look at only one row – the one with values (2,2).

select sum_b from
(select a,sum(b) sum_b
from test
group by a) inner_query
where 
a=2;

This takes almost 2 seconds running in an Amazon RDS instance:

+-------+
| sum_b |
+-------+
|     2 |
+-------+
1 row in set (1.53 sec)

The corresponding Oracle query running in an on-premises Linux VM runs faster:

SQL> select sum_b from
  2  (select a,sum(b) sum_b
  3  from test
  4  group by a) inner_query
  5  where
  6  a=2;

     SUM_B
----------
         2

Elapsed: 00:00:00.08

I realize that the hardware is not the same but I think they are reasonably close. Best I can tell the Oracle version is faster because Oracle pushes the a=2 predicate into the subselect but MySQL doesn’t. The Oracle execution plan shows the a=2 predicate in the inner index scan:

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |     6 |     4   (0)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT        |       |     1 |     6 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST  |     1 |     6 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | TESTI |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
...
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"=2)

I am still learning MySQL plans but it seems that MySQL does a full index scan:

As I said, I am new to MySQL so I may have missed some way around this. One simple solution would be to write your query without a subselect so that the predicate is in the group by query. This runs about as fast as the Oracle example above:

select sum(b) sum_b
from test
where 
a=2
group by a
--------------

+-------+
| sum_b |
+-------+
|     2 |
+-------+
1 row in set (0.10 sec)

This does a simple index lookup.

I have uploaded a zip of my Oracle and MySQL test here: zip

Maybe if you have used Oracle in the past and are now using MySQL 5.7 you should keep in mind that MySQL 5.7 does not push predicates into subqueries in the same situations that Oracle does and build your queries around this difference.

Bobby

 

 

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.