A couple of weeks ago I upgraded a major production database from 126.96.36.199 to 188.8.131.52. Our developers kept hitting one ugly bug after another on the minimally patched 184.108.40.206 so I wanted to get them on the most patched up version of Oracle 11. This is on HP-UX Itanium 11.31. I made two key configuration changes that I want to discuss here. I changed the database to use direct I/O and I set a parameter so that the database would not use direct path reads for most table and partition scans. The main point of this blog post is that one change required the other. I moved to direct I/O to improve checkpoint performance but that slowed down queries that repeatedly scanned segments using direct path reads. I set an underscore parameter that prevented direct path reads except on very large table scans and that sped the problem queries back up to normal.
In our testing a coworker ran a massive number of updating jobs at the same time. A truncate took about an hour during this excessive load. I ran a simple test script to truncate a small table during this load and it took over a minute. The truncate spent almost all of its time on these two waits:
- local write wait – 61%
- enq: RO – fast object reuse – 38%
This was on a Delphix clone of production so the NFS file systems were already forcing the database to use direct I/O. But, I had to max out the database writer processes to get the needed checkpoint performance. I based this decision on my experience with checkpoint performance on another database which I documented in this post: url. I set these parameters:
We use these settings on a large data warehouse staging database on 220.127.116.11 and HP-UX 11.31 to tune checkpoint performance there. So, we have tested this configuration in production for several years.
After changing these parameters the truncates ran fast under the same heavy updating load.
We remounted the database file systems with mincache=direct, convosync=direct options on our non-Delphix physical databases when we moved them to direct I/O.
For a long time I have seen issues with slowness of full partition scans on our Delphix clones of the production database that this post is about. Here is a post about this issue: url. I remember hearing about an undocumented underscore parameter that you could use to prevent direct path reads and always meant to look into it but kept holding back. I didn’t want to set the underscore parameter on my Delphix copies of production and not set it on my physical production database. That would make our test database different from production and that could lead to invalid testing. But, an upgrade was a great time to put the parameter in both on Delphix and in production. This was the parameter that I put in:
I read several good blog posts about this parameter and other ways to deal with direct path reads in Oracle 11. Evidently some behavior changed in Oracle 11 that caused full scans to bypass the buffer cache in more situations, using direct path reads. Some post talked about DBA’s just bumping up the value for _small_table_threshold when they upgraded to Oracle 11 so I tried it and testing proved it out. Here is an earlier post about my work with this parameter: url.
So, the upgrade has given me a chance to not only move our database to a more patched up, and hopefully stable, release but it also was a chance to make two key configuration changes. Without direct I/O our production database was using the Unix file system cache to cache the blocks that were being repeatedly scanned using direct path reads. But, to switch to direct I/O I had to end the direct path reads so that the database buffer cache would be used to cache the scanned blocks. Direct I/O and the 36 database writer processes gave us great checkpoint performance. Disabling direct path read kept the queries that had depended on the Unix filesystem cache running just as fast by allowing them to use the database buffer cache.