This graph represents commit time compared to CPU utilization and redo log write time. I’ve included only the hourly intervals with more than 1,000,000 commits. At these peaks the number of commits ranges 1 to 1.6 million commits per hour so each point on the graph represents roughly the same commit rate. I’m puzzled by why the commit time bounces around peaking above 5 milliseconds when I can’t see any peaks in I/O or CPU that correspond to the commit time peaks.
I derived CPU% from DBA_HIST_OSSTAT. I got the other values by getting wait events from DBA_HIST_SYSTEM_EVENT. Commit time is log file sync wait time. Redo write time is log file parallel write wait time. I converted the wait times to milliseconds so they fit nicely on the chart with CPU%.
I thought I would pass this along as a puzzle that I haven’t figured out.
Here is a zip of the script I used to get the data, its raw output, and the spreadsheet I used to make the chart: zip
– Bobby
P.S. This is on HP-UX 11.31, Itanium, Oracle 11.2.0.3
P.P.S Did some more work on this today. Looks like the high commit time periods have short spikes of long redo log writes even though the average over the hour is still low. I’m looking at DBA_HIST_SYSTEM_EVENT to get a histogram of the log file parallel write waits and there are a number in the 1024 bucket when the log file sync time is high on average.
END_INTERVAL_TIME LFPW_MILLI LFPW_COUNT AVG_COMMIT_MS AVG_WRITE_MS ------------------- ---------- ---------- ------------- ------------ 21-AUG-14 11.00 AM 1 268136 9.14914833 2.45438987 21-AUG-14 11.00 AM 2 453913 9.14914833 2.45438987 21-AUG-14 11.00 AM 4 168370 9.14914833 2.45438987 21-AUG-14 11.00 AM 8 24436 9.14914833 2.45438987 21-AUG-14 11.00 AM 16 5675 9.14914833 2.45438987 21-AUG-14 11.00 AM 32 6122 9.14914833 2.45438987 21-AUG-14 11.00 AM 64 3369 9.14914833 2.45438987 21-AUG-14 11.00 AM 128 2198 9.14914833 2.45438987 21-AUG-14 11.00 AM 256 1009 9.14914833 2.45438987 21-AUG-14 11.00 AM 512 236 9.14914833 2.45438987 21-AUG-14 11.00 AM 1024 19 9.14914833 2.45438987 21-AUG-14 11.00 AM 2048 0 9.14914833 2.45438987 21-AUG-14 02.00 PM 1 522165 2.97787777 1.64840599 21-AUG-14 02.00 PM 2 462917 2.97787777 1.64840599 21-AUG-14 02.00 PM 4 142612 2.97787777 1.64840599 21-AUG-14 02.00 PM 8 17014 2.97787777 1.64840599 21-AUG-14 02.00 PM 16 4656 2.97787777 1.64840599 21-AUG-14 02.00 PM 32 5241 2.97787777 1.64840599 21-AUG-14 02.00 PM 64 1882 2.97787777 1.64840599 21-AUG-14 02.00 PM 128 820 2.97787777 1.64840599 21-AUG-14 02.00 PM 256 149 2.97787777 1.64840599 21-AUG-14 02.00 PM 512 10 2.97787777 1.64840599 21-AUG-14 02.00 PM 1024 2 2.97787777 1.64840599 21-AUG-14 02.00 PM 2048 0 2.97787777 1.64840599
There were 19 waits over half a second in the first hour and only 2 in the second hour. Maybe all the log file sync waits pile up waiting for those long writes. Here is a graph that compares the number of waits over half a second – the 1024 ms bucket – to the average log file sync and log file parallel write times for the hour:
You can see that the average redo write time goes up a little but the commit time goes up more. Maybe commit time is more affected by a few long spikes than by a lot of slightly longer write times.
Found a cool blog post that seems to explain exactly what we are seeing: blog post(DOES NOT EXIST)