Submitted two talks to the East Cost Oracle Users Conference

Submitted two talks to the East Cost Oracle Users Conference:

http://www.eastcoastoracle.org/ssl/presentations.asp – NO LONGER EXISTS

First one is my Exadata talk that I gave at work.  Second is new Intro to SQL Tuning talk I’m preparing for my coworkers.  I should know some time in July if one of these is accepted.

Here are the abstracts:

Exadata Distinctives

This presentation focuses on the new features of Exadata for improving Oracle query performance.  It explains Exadata smart scans. We include diagrams that compare and contrast normal Oracle database servers and the flow of data from disk through memory buffers and networking to the end user with these same things on Exadata.  In particular we show that smart scans bypass the high speed memory of the buffer cache on the database server but use the lower speed flash memory on the cell servers as a form of buffer cache which has tradeoffs in terms of the sorts of queries that work best in each case.

This talk includes observations from real world experience with a large custom written datawarehouse and an implementation of OBIA including tuning this packaged PeopleSoft datawarehouse with minimal application change.  The query tuning focuses on encouraging SQL queries to use full scans to take advantage of the automatic tuning features of the smart scans.  It also shows how to bypass the Exadata features where normal Oracle query processing is more efficient.

The presentation concludes with notable concerns for any new adopter of Exadata including issues with tables that have more than 255 columns, performance problems of sort segments in RAC databases with many nodes, and stability issues that come with a new platform.

Introduction to Oracle SQL Tuning

At the core of SQL tuning are three fundamental concepts – join order, join method, and access method. This presentation will explain these three concepts and show that tuning SQL boils down to making the proper choice in each category. It is hard to imagine an area of SQL tuning that doesn’t touch on one or more of these three areas and in any case the presenter knows from experience that tuning the join order, join methods, and access methods consistently leads to fantastic results.

We will demonstrate how to tune an Oracle SQL query by examining and manipulating the join order, join methods, and access methods used in the query’s execution plan. First you must query the tables to determine the true number of rows returned from each table
and from each likely join. Then you identify which join order, join method, or access method chosen by the optimizer is inefficient. Then you use one of the suggested
methods to change the optimizer’s choice to be the efficient one. These methods include
gathering optimizer statistics, hints, setting init parameters, and adjusting the parallel
degree. Lastly, you must test your changed plan to make sure the new combination of join order, join methods and access methods actually improve the performance.

– Bobby

Posted in Uncategorized | 2 Comments

New SQL Tuning paper draft

I’m working on a paper titled “Introduction to SQL Tuning”.  I’d like to have this be the start of a training program at work for DBAs and developers.  I’ve completed a draft of a paper on this topic.  If anyone out there wants to read it and give me feedback here is the current version:

https://www.bobbydurrettdba.com/uploads/introtosqltuningdraft05212012.pdf

You can reach me at bobby@bobbydurrettdba.com

– Bobby

Posted in Uncategorized | Leave a comment

Finding query with high temp space usage using ASH views

We are having challenges with queries that spend most of their time on these waits:

enq: TS – contention

This wait indicates that a query on one instance of a RAC system is causing its instance to add temp space to its sort segment and remove space from another instance’s sort segment.  For some reason this process is extremely slow and totally bogs down any query waiting on this type of wait.  I worked an SR with Oracle on this issue and Oracle development claims this is not a bug.  I’ve duplicated the problem on both 10.2 and 11.2 RAC.

Anyway, so now we periodically see queries spending most of their time on this wait and this occurs when some other query is eating up all the temp space or has done so recently.  So how do I go back in time and figure out what query was using temp space?  If I wanted to see the session ids of the current users of temp space on a RAC system I would just join gv$tempseg_usage to gv$session like this:

select se.inst_id,se.sid,sum(tu.blocks)*8192
from
gv$tempseg_usage tu,
gv$session se
where
tu.inst_id=se.inst_id and
tu.session_addr=se.saddr and
tu.session_num=se.serial#
group by se.inst_id,se.sid
order by se.inst_id,se.sid;

This assumes an 8K blocksize.  But to go back in time you can use the TEMP_SPACE_ALLOCATED column of V$ACTIVE_SESSION_HISTORY or DBA_HIST_ACTIVE_SESS_HISTORY to identify a query that consumed a lot of temp space.

Here is an example:

select sql_id,max(TEMP_SPACE_ALLOCATED)/(1024*1024*1024) gig 
from DBA_HIST_ACTIVE_SESS_HISTORY 
where 
sample_time > sysdate-2 and 
TEMP_SPACE_ALLOCATED > (50*1024*1024*1024) 
group by sql_id order by sql_id;

This gives the sql_id and maximum allocated temp space of any queries that ran in the past two days and exceeded 50 gigabytes of temp space.

This is a great feature of these ASH views.  Now we can go back in time and find the queries that were using all the temp space and tune them.

– Bobby

Posted in Uncategorized | 22 Comments

sar -d output during disk issue

I thought it might be helpful to show what sar -d returns during a disk issue.  Sorry, I only have a screenshot I took during the problem.  Click on the smaller image to see the full size jpeg.

This was a “sar -d 5 2” –  two five second intervals.  Here is what this looks like now that the disk subsystem is functioning again:

 

When I first checked on the problem database the sessions were waiting on log file sync.  I did an insert into a table and the insert ran quickly, but the commit took like 27 seconds.  I believe this is because the block I inserted into wasn’t immediately written to disk but the commit forces a write to the redo log.  So, anything doing a commit was taking forever.  One thing I’m confused about is why the long time is avwait while the avserv is short.  Maybe this is because the current request is taking forever and isn’t included in avserv whereas the wait time for the next request is included in avwait, but that is just a guess.  Maybe if I did a longer interval for sar -d then the avserv would have been higher, but I’ve only got the screenshot I took at the time.

– Bobby

Posted in Uncategorized | Leave a comment

Session profile using ASH

Worked on the scripts in this zip today: https://www.bobbydurrettdba.com/uploads/sessionprofiledbahist.zip

I wanted to see a session profile from the DBA_HIST_ACTIVE_SESS_HISTORY view.  My idea was to include the time the session was inactive as well as the time it was waiting on an event or using the CPU.

On both these scripts you have to manually update the date range.

ashsessionlist.sql – this just gives you a list of the sessions for a given date range.  Sorts them from most to least active.  Also shows the total timeframe so you can tell how much inactive time there was compared with active.

ashsessionprofile.sql – takes a given instance, SID, and sequence number and produces a nice session profile.  The key to any session profile is to include some comparison of the waits and cpu with the overall elapsed time.

– Bobby

Posted in Uncategorized | Leave a comment

Exadata presentation

Slides for a presentation on Exadata that I did at work:

https://www.bobbydurrettdba.com/uploads/ExadataDistinctives.ppt

Hopefully I balanced out the good and bad.  Exadata has some good features but we have hit a large number of bugs.  Given adequate testing you could benefit from the new features.

– Bobby

Posted in Uncategorized | Leave a comment

Autotask clients fall into OTHER_GROUPS on custom plans

The autotask clients run in their own resource consumer groups.  So, stats, space, and SQL tuning tasks have some consumer groups that you have to include in your own plans if your plan will be active while the window for these tasks is open.

Here are the Oracle supplied consumer groups for the autotask clients:

SQL> select client_name,consumer_group
   2 from DBA_AUTOTASK_CLIENT;

CLIENT_NAME                      CONSUMER_GROUP
-------------------------------- ------------------------------
auto optimizer stats collection  ORA$AUTOTASK_STATS_GROUP
auto space advisor               ORA$AUTOTASK_SPACE_GROUP
sql tuning advisor               ORA$AUTOTASK_SQL_GROUP

In a case I was working on our custom resource plan was active but didn’t include any of the consumer groups listed above.  So, when stats ran it fell through to the other_groups part of the plan.  Sadly, other_groups was set to 0% which should never be the case and this caused the stats job to hang when the CPU use was high.

So, don’t make other_groups 0% or include these special autotask consumer groups in your plan if your plan will be active during the maintenance windows.

– Bobby

Posted in Uncategorized | Leave a comment

Resource Manager wait events

You many wonder why you should care about Oracle Resource Manager wait events.  You should care because every Oracle 11g database uses Resource Manager by default during the maintenance windows.  So, even if you have not intentionally configured RM you may see waits that start with resmgr:  which are RM waits.  It is helpful to understand what the two main RM waits are and what they mean.

The default maintenance windows are M-F 22:00 for four hours and 06:00 for 20 hours on the weekend.  Several times we have had performance complaints and had people say that one of the main waits during the slow time was “resmgr:cpu quantum“.

The resmgr:cpu quantum wait event means that your database server’s CPU is high enough that resource manager has kicked in and is dividing up the CPU cycles among the various sessions that need it.  The plan that runs by default during the maintenance window is called DEFAULT_MAINTENANCE_PLAN.  This plan allocates most of the CPU resources to tasks other than the maintenance tasks.  This is really a helpful feature because it prevents the maintenance tasks like gathering optimizer statistics from eating up a lot of CPU resources that are needed by user processes.  But, if you have a busy system and see a bunch of resmgr:cpu quantum waits and don’t know what they are it can be disconcerting.  The solution is to look at what is using so much CPU and tune that.  Resource Manager actually does a great job of dialing back the CPU of the maintenance jobs so there is no need to disable it.

I’ve seen one situation where resmgr:cpu quantum indicated a real problem.  The active plan had a 0% CPU allocation.  In this case the session was waiting on a resmgr:cpu quantum wait but never got a CPU slice.  This is essentially a hang and needs to be addressed by giving the user associated with the session a > 0% CPU allocation.

The second RM wait that I have seen is “resmgr:become active“.  You will only see this wait if you have configured RM yourself to have limits on the number of active sessions.  When the number of active sessions that are in a given consumer group reaches the maximum then any other sessions that want to become active will wait on resmgr:become active.  I really don’t like active session limits and encourage you to carefully consider whether these limits make sense in your situation.  The sessions waiting on resmgr:become active are essentially hung and eventually will timout.  If you have a hang like situation and see a bunch of resmgr:become active waits then there is cause for concern and you may need to increase or eliminate your active session limits.

So, to summarize:

resmgr:cpu quantum – sessions are sharing CPU, not necessarily a problem

resmgr:become active – sessions are hung , can be a big problem

– Bobby

Posted in Uncategorized | 2 Comments

Oracle blog aggregators

Found a couple of Oracle blog aggregators.  These sites combine the many Oracle DBA blogs into single sites.

http://www.orafaq.com/

http://orana.info/ – NO LONGER EXISTS

It looks like Oracle has an aggregator for their own employees:

https://blogs.oracle.com/

Pretty cool.

– Bobby

Posted in Uncategorized | Leave a comment

DBA_HIST_ACTIVE_SESS_HISTORY

I’ve been using the view DBA_HIST_ACTIVE_SESS_HISTORY to investigate problems lately.  I’ve found this useful for both performance problems and hangs.  I’ve done some more complicated things, but I find that just dumping out every column of the view for a very narrow window of time can help explain what was happening in a way that other tools do not. 

DBA_HIST_ACTIVE_SESS_HISTORY records the list of active sessions every 10 seconds.  It is like looking at your active sessions in Toad except going back in time.  We have been keeping six weeks of history on the AWR so I can go back to any ten second interval over the past six weeks and see what was running.

In the case of a hang or locking you can see the blocking sessions and what they are spending time on.  Here is a recent example I used to debug a hang caused by sessions on two different instances – 1 and 12.  I picked a 10 second interval that I knew was during the hang:

select 
*
from DBA_HIST_ACTIVE_SESS_HISTORY
where 
sample_time 
between 
to_date('18-APR-2012 10:40:00','DD-MON-YYYY HH24:MI:SS')
and 
to_date('18-APR-2012 10:40:10','DD-MON-YYYY HH24:MI:SS')
and instance_number in (1,12)
order by sample_time,instance_number,SESSION_ID;

One key point if you have a hang or other issue is to run DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT before bouncing the database if you have to bounce.  Otherwise you will lose all of the active session history since the last snapshot when the database bounces.

– Bobby

Posted in Uncategorized | 2 Comments