How to figure out disk network latency using cached blocks

I was working on some disk performance problems and came across a simple way to test the latency or delay built into the networks that connect my database servers to their disk storage.  All I have to do is read some disk blocks from a table several times to be sure they are cached in the disk storage server and then flush the database buffer cache and read from the table once more.  Since the blocks are not cached in the database buffer cache but are cached on the disk array the time it takes to read the blocks approaches the time it takes to copy the blocks over the network.  Of course there is some CPU on both the source and target to copy the blocks in memory but the physical disk read is eliminated and you see close to the minimum time it is possible for a transfer to take.

So, here is my simple test script.  It assumes the user, password, and tnsnames.ora name are passed as parameters and that the user is a DBA user like SYSTEM.

connect &1/&2@&3

-- create test that will show how fast reads
-- can be when cached on the disk subsystem

-- document version

select * from v$version;

-- create small table

drop table test;
create table test as select * from dba_tables where rownum < 1000;

-- query three times to get cached in the disk system

select sum(blocks) from test;
select sum(blocks) from test;
select sum(blocks) from test;

-- flush from database cache

alter system flush buffer_cache;

-- reconnect so session waits are cleared

disconnect
connect &1/&2@&3

select sum(blocks) from test;

-- show average scattered read
-- should be best time you can
-- get since blocks are cached
-- in the disk subsystem

VARIABLE monitored_sid number;

begin

SELECT sid into :monitored_sid from v$session 
where audsid=USERENV('SESSIONID');

end;
/

select  
total_waits,
time_waited_micro/total_waits
FROM V$SESSION_EVENT a
WHERE a.SID= :monitored_sid
and event='db file scattered read';

Here is what the output looks like on a slow system with network problems (over NFS in my case):

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
          4                       1410.25

Here is what it looks like on a nice fiber network:

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
          4                          40.5

40 microseconds for a disk read is sweet compared to 1410!

I’ve uploaded the script and three runs I made with it here.

– Bobby

P.S.  I realized this morning that the first example with 40 microseconds for a read is really just pulling from the operating system’s filesystem cache because the database I was testing on doesn’t use direct IO.  Here is a test on a system with the SAN that uses direct IO:

TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS
----------- -----------------------------
          5                         478.4

I ran this several times and this was the best result.  The others were in the 600-700 microsecond range mostly.  So, I guess when you run the test script you are really testing everything behind the scenes that could benefit from caching except for the database’s buffer cache.  Without direct IO you may never reach your SAN network because you will have cached the blocks at the OS filesystem cache level.

 

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.