How to figure out disk network latency using cached blocks

I was working on some disk performance problems and came across a simple way to test the latency or delay built into the networks that connect my database servers to their disk storage.  All I have to do is read some disk blocks from a table several times to be sure they are cached in the disk storage server and then flush the database buffer cache and read from the table once more.  Since the blocks are not cached in the database buffer cache but are cached on the disk array the time it takes to read the blocks approaches the time it takes to copy the blocks over the network.  Of course there is some CPU on both the source and target to copy the blocks in memory but the physical disk read is eliminated and you see close to the minimum time it is possible for a transfer to take.

So, here is my simple test script.  It assumes the user, password, and tnsnames.ora name are passed as parameters and that the user is a DBA user like SYSTEM.

connect &1/&2@&3

-- create test that will show how fast reads
-- can be when cached on the disk subsystem

-- document version

select * from v$version; -- create small table drop table test; create table test as select * from dba_tables where rownum < 1000; -- query three times to get cached in the disk system select sum(blocks) from test; select sum(blocks) from test; select sum(blocks) from test; -- flush from database cache alter system flush buffer_cache; -- reconnect so session waits are cleared disconnect connect &1/&2@&3 select sum(blocks) from test; -- show average scattered read -- should be best time you can -- get since blocks are cached -- in the disk subsystem VARIABLE monitored_sid number; begin SELECT sid into :monitored_sid from v$session
where audsid=USERENV('SESSIONID');

end;
/

select
total_waits,
time_waited_micro/total_waits
FROM V\$SESSION_EVENT a
WHERE a.SID= :monitored_sid
and event='db file scattered read';

Here is what the output looks like on a slow system with network problems (over NFS in my case):

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
4                       1410.25

Here is what it looks like on a nice fiber network:

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
4                          40.5

40 microseconds for a disk read is sweet compared to 1410!

I’ve uploaded the script and three runs I made with it here.

– Bobby

P.S.  I realized this morning that the first example with 40 microseconds for a read is really just pulling from the operating system’s filesystem cache because the database I was testing on doesn’t use direct IO.  Here is a test on a system with the SAN that uses direct IO:

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
5                         478.4

I ran this several times and this was the best result.  The others were in the 600-700 microsecond range mostly.  So, I guess when you run the test script you are really testing everything behind the scenes that could benefit from caching except for the database’s buffer cache.  Without direct IO you may never reach your SAN network because you will have cached the blocks at the OS filesystem cache level.