Don’t use TIME_WAITED in ASH views

I attended John Beresniewicz’s Active Session History (ASH)  talk at Collaborate 13 on Monday.  One simple point from it was that he commonly sees queries of the ASH views that use the TIME_WAITED column incorrectly and result in incorrect results.  The ASH views are V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY.

I’ve used DBA_HIST_ACTIVE_SESS_HISTORY frequently to diagnose Oracle performance problems and I don’t believe that I use TIME_WAITED.  All I do is count up the number of rows in DBA_HIST_ACTIVE_SESS_HISTORY and count each row as 10 seconds of time.  This would be either time on the CPU or wait time.  Here is a query I recently used on a real performance problem:

select 
case SESSION_STATE
when 'WAITING' then event
else SESSION_STATE
end TIME_CATEGORY,
(count(*)*10) seconds
from DBA_HIST_ACTIVE_SESS_HISTORY a,
V$INSTANCE i,
dba_users u
where 
a.user_id = u.user_id and
a.instance_number = i.instance_number and
a.user_id = u.user_id and
sample_time 
between 
to_date('2013-04-02 00:00','YYYY-MM-DD HH24:MI')
and 
to_date('2013-04-02 22:00','YYYY-MM-DD HH24:MI')
and
a.sql_id = 'c12m4zxj3abm6'
group by SESSION_STATE,EVENT
order by seconds desc;

Disregard the joins to dba_users and v$instance.  These are left over from previous uses of this query.  Here is the output:

TIME_CATEGORY                     SECONDS
------------------------------ ----------
free buffer waits                   49110
db file parallel read               11310
write complete waits                 1810
db file sequential read              1600
ON CPU                                720
read by other session                 220
PX qref latch                          50
db file scattered read                 20
direct path read temp                  20
latch: redo allocation                 10
latch: cache buffers lru chain         10

An AWR report from 00:00 to 21:00 the same day  showed this sql_id with one execution of length 50,540.80 seconds that didn’t finish.  So, this roughly corresponds to the numbers in the ASH profile query above.  About 49,000 of the 60,000 seconds of run time are accounted for by free buffer waits.  We believe this is caused by too frequent checkpointing but haven’t verified it yet with a fix in production.

Anyway, this type of query was useful to me and it didn’t use the TIME_WAITED column so it validates to some extent the notion that you don’t need to use TIME_WAITED on the ASH views to have a useful query.

- Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I've been working as an Oracle database administrator and PeopleSoft administrator since 1994. I'm very interested in Oracle performance tuning.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Don’t use TIME_WAITED in ASH views

  1. Justin says:

    Hi,

    Did he give a reason why you shouldn’t use the time_waited column?

    regards
    /Justin

  2. Bobby says:

    Justin,

    I think it is because ASH samples every second and so summing up the TIME_WAITED column will give you total wait times that are inaccurate. I think the only point of TIME_WAITED would be to figure out how long the waits are – i.e. is it a bunch of short waits or a few long ones? But, it is more accurate just to add up all the rows that fit into each wait category and then you get an approximate view of the total time for each wait (and CPU). It is approximate because it samples and you assume the observed wait took the entire time between samples. Here is some verbiage from the slides in John’s own words: “ASH is neither a random sample nor a complete sample of TIME_WAITED by session events”. So he says not to just sum or average TIME_WAITED.

    - Bobby

  3. Justin says:

    Hi,

    That makes sense.

    Thanks
    /J

Leave a Reply