I attended John Beresniewicz’s Active Session History (ASH) talk at Collaborate 13 on Monday. One simple point from it was that he commonly sees queries of the ASH views that use the TIME_WAITED column incorrectly and result in incorrect results. The ASH views are V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY.
I’ve used DBA_HIST_ACTIVE_SESS_HISTORY frequently to diagnose Oracle performance problems and I don’t believe that I use TIME_WAITED. All I do is count up the number of rows in DBA_HIST_ACTIVE_SESS_HISTORY and count each row as 10 seconds of time. This would be either time on the CPU or wait time. Here is a query I recently used on a real performance problem:
select case SESSION_STATE when 'WAITING' then event else SESSION_STATE end TIME_CATEGORY, (count(*)*10) seconds from DBA_HIST_ACTIVE_SESS_HISTORY a, V$INSTANCE i, dba_users u where a.user_id = u.user_id and a.instance_number = i.instance_number and a.user_id = u.user_id and sample_time between to_date('2013-04-02 00:00','YYYY-MM-DD HH24:MI') and to_date('2013-04-02 22:00','YYYY-MM-DD HH24:MI') and a.sql_id = 'c12m4zxj3abm6' group by SESSION_STATE,EVENT order by seconds desc;
Disregard the joins to dba_users and v$instance. These are left over from previous uses of this query. Here is the output:
TIME_CATEGORY SECONDS ------------------------------ ---------- free buffer waits 49110 db file parallel read 11310 write complete waits 1810 db file sequential read 1600 ON CPU 720 read by other session 220 PX qref latch 50 db file scattered read 20 direct path read temp 20 latch: redo allocation 10 latch: cache buffers lru chain 10
An AWR report from 00:00 to 21:00 the same day showed this sql_id with one execution of length 50,540.80 seconds that didn’t finish. So, this roughly corresponds to the numbers in the ASH profile query above. About 49,000 of the 60,000 seconds of run time are accounted for by free buffer waits. We believe this is caused by too frequent checkpointing but haven’t verified it yet with a fix in production.
Anyway, this type of query was useful to me and it didn’t use the TIME_WAITED column so it validates to some extent the notion that you don’t need to use TIME_WAITED on the ASH views to have a useful query.
Did he give a reason why you shouldn’t use the time_waited column?
I think it is because ASH samples every second and so summing up the TIME_WAITED column will give you total wait times that are inaccurate. I think the only point of TIME_WAITED would be to figure out how long the waits are – i.e. is it a bunch of short waits or a few long ones? But, it is more accurate just to add up all the rows that fit into each wait category and then you get an approximate view of the total time for each wait (and CPU). It is approximate because it samples and you assume the observed wait took the entire time between samples. Here is some verbiage from the slides in John’s own words: “ASH is neither a random sample nor a complete sample of TIME_WAITED by session events”. So he says not to just sum or average TIME_WAITED.
That makes sense.