Faster commit time with fewer sessions

I’m trying to understand how to prevent the database server that supports one of our web sites from becoming overwhelmed when the CPU %used gets too high.  Once CPU gets above about 80% performance takes a nose dive and CPU quickly flat lines at 100% and queuing begins making the system unusable.  I mentioned in an earlier post that Tom Kyte, a noted Oracle performance expert, had recommended that our web server run with fewer database sessions.  Right now we have hundreds of connections to the database from the web servers.  Based on Tom’s recommendation the best number of connections from the web servers to the database would be about twice the number of CPU’s.

I believe that one reason we need to move to fewer sessions is to speed up commit time.  When the CPU gets busy the log writer process (LGWR) gets starved for CPU and sessions doing commits back up waiting for their log file sync’s to complete.  So, I put together a test using my laptop which compares doing a bunch of inserts and commits spread across two different numbers of sessions.  In the first case I used twice the number of CPU’s which corresponds to Tom Kyte’s recommendation.  In the second case I used 16 times the number of CPU’s which represents our current configuration with many more sessions than CPU’s on our real web application.  My laptop has 2 CPU’s so the first example used 4 sessions and the second test used 32.

The result was that I was able to do 4 sessions with 80,000 inserts and commits each in 143 seconds, but it took me 627 seconds to do 32 sessions with 10,000 inserts and commits each.  The main factor in the increase was the increased log file sync time.  We know that our web servers do a lot of inserts and commits and that log file sync waits go crazy during our outages so this simple test seems to be a  valid comparison.

Here is a profile of the time spent by one of our four sessions:

TIMESOURCE                        SECONDS PERCENTAGE
------------------------------ ---------- ----------
TOTAL_TIME                            143        100
log file sync                          72         50
SQL*Net message from client            44         31
CPU                                    21         15
UNACCOUNTED_TIME                        4          3

Here is the profile of one of the 32 sessions:

TIMESOURCE                        SECONDS PERCENTAGE
------------------------------ ---------- ----------
TOTAL_TIME                            613        100
log file sync                         420         69
SQL*Net message from client           167         27
UNACCOUNTED_TIME                       22          4
CPU                                     3          0

In both cases the row count at the end is 320,000.  The AWR reports make it clearer what is going on.  The log file sync time in the second case is mostly CPU and much longer than in the first.  Here are the top five events with four sessions:

 

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
log file sync 320,008 289 1 73.13 Commit
DB CPU 109 27.56
SQL*Net message to client 640,097 2 0 0.43 Network
log file switch completion 12 1 70 0.21 Configuration
Disk file operations I/O 25 0 17 0.11 User I/O

Here they are with 32 sessions:

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
log file sync 320,094 13,535 42 93.66 Commit
DB CPU 123 0.85
Disk file operations I/O 134 10 75 0.07 User I/O
enq: TX – contention 252 10 38 0.07 Other
enq: FB – contention 113 5 42 0.03 Other

Note how the log file sync has gone up to 42 milliseconds instead of 1 millisecond.  Oracle document “Troubleshooting: log file sync’ Waits [ID 1376916.1]” explains that the background wait “log file parallel write” shows how long the I/O is taking for each log file sync.  In the second case log file parallel write was 11 milliseconds so the difference between the log file sync and log file parallel write 42-11=31 ms is mostly CPU queue time.

Here is how my CPU looked with four sessions:

CPU use in four session example

CPU use in four session example

Here is how my CPU looked with 32 sessions:

CPU use with 32 sessions

CPU use with 32 sessions

Here is a zip with all my test files and their output: zip

So, this experiment is evidence to support the idea that if you have a bunch of inserts and commits the total time taken will be less if the number of sessions used is about twice the number of CPU’s as opposed to much more than twice the number of CPUs.

– Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Faster commit time with fewer sessions

  1. Bobby, that is what connection pooling is for. Also, if there is no possibility of connection pooling, there is always the shared server configuration. It’s a very underutilized feature which can help in certain situations. Having a busy web server with web-to-database connection ratio 1:1 is just not realistic.

  2. Bobby says:

    Mladen,

    I agree with you. In my situation we have an important, heavily used, web site with a farm of web servers using Websphere connection pooling. But, the connection pool averages over 400 database connections. When the CPU spikes on our database server for any reason then the web servers spin up hundreds more connections, the CPU goes through the roof, and the log file sync time goes very high. The web server transactions are short and so we have a lot of commits. So, the example I posted about, which is just on my laptop, simulates what happens when the CPU gets busy and gives me some hope that we can make the web site more resilient if we can reduce the number of connections. I’ve been looking at shared servers because it is something I can do on the database side, but so far my tests have not had good results. But, I’ve never used shared servers before and there may be some simple thing I’m doing wrong or it could just be some limitation of using my laptop to test it.

    The challenge is that the application seems to hold on to the connections for a long time so the connection pool isn’t really helping. It isn’t much better than just connecting to the database whenever you want and then disconnecting when you are done.

    – Bobby

  3. Pingback: Faster commit time with shared servers | Bobby Durrett's DBA Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.