Simple Performance Tuning Example

I helped resolve an Oracle database performance problem yesterday and I thought I would capture what I did in this blog post. It is a “simple” problem in that the solution was easy once I understood the issue and because this sort of problem happens often. But the process to get to the solution is not completely simple. I think documenting the process may have more value than writing down the details of this specific problem.

Today is Tuesday. Thursday last week our development team was doing some testing and performance of the system was terrible. I was busy working on another project that is my primary assignment right now, so I did not look at it. It was also Thursday before the July 4th holiday weekend and we are all mentally and physically checking out a bit early. Some key people were already off on vacation. So, we shut it all down until Monday (yesterday). Also, late Thursday there was some thought that there was a system issue maybe with our virtual machines’ host or networking. So, we let it go. Monday our dev team said they thought the slowness was caused by bad data including duplicates that were throwing errors on a target system (non-Oracle) in the cloud. So, they cleaned up the data and started a new run. But it was slow as molasses and the application front end was also super slow.

In retrospect, looking at the paragraph above, I bet a lot of DBAs with performance tuning experience would jump to the solution. I can think off the top of my head of a few types of things that might cause these symptoms. But I did not immediately think of it so I went through a process to get there. No leap of intuition was required except maybe at the very end.

Since the problem query was currently running the first step was to look at the session monitor in Toad. I may have looked at HP Unix’s Glance tool, which is like Top, to see the CPU usage and top processes at an OS level. What I saw was a parallel query running with 8 parallel child processes and the query had been running about 30 seconds.

Step two was to take an AWR report. First, I ran a command like this to get a snapshot after the current activity:

exec dbms_workload_repository.create_snapshot;

Then I ran an AWR report for the last two snapshots.

The “SQL ordered by Elapsed Time” part of the report looked like this:

SQL ordered by Elapsed Time

The top query azrtxpx1g6mxy was the one I saw in Toad. The second query is a subselect of the top one. Averaging 51.23 seconds and having 46.77% of its time on I/O I eventually went where I usually go with AWR reports. I went to the “Segments by Logical Reads” section of the AWR report. Here is part of that section:

Segments by Logical Reads

I hid the table names, but the point is that the top segment was doing 99.67% of the logical I/Os. So, the issue had to do with I/O. But looking at the “Top 5 Timed Foreground Events” section I/O seems very efficient.

Foreground Events

The top wait was the I/O event “direct path read” which can be deceptively low on systems that use asynchronous I/O. But this system uses HP Unix file system I/O which does not support asynchronous I/O so the direct path read number is correct. One of my list of possible things that could be wrong before I started was some sort of I/O contention but this report pointed away from I/O hardware.

So, the AWR report and my look at Toad session monitor points to a SQL statement doing parallel query full scans of a table and that SQL statement running 10s of seconds in length. So, I looked at the execution plan next. I like to use my getplans.sql SQL*Plus script to extract a query’s plan from the AWR using its SQL_ID. I ran this is SQL*Plus:

@getplans azrtxpx1g6mxy

Sure enough, the same top table was running a parallel full table scan in the plan.

Somewhere in this process I did some counts. I think the simplicity of using select count(*) queries may get in the way of people using it but count(*) queries are one of the biggest helps to me tuning a SQL statement. I did two types of counts for this case. I did a select count(*) on the top table and found out that it had 160 million rows. The count took 45 seconds even through it was doing a fast full scan of an index and not the full table. Then I queried grouping by the column that joins the table to its parent table. This is a typical header detail join where the child table joins the parent on the parent’s unique key. Then I got the idea that the query might run faster if there were an index on the joining column.

I ran my optimizer statistics scripts to check the index columns and to look at the column statistics to see how selective the joining column is. I found that there was no index on the column that the header table used to join to the problem detail table. So, I recommended that the dev team add an index on that column and including the other columns in the where clause for good measure. They might have to try two or three different column choices and orders to get the best index.

But the dev team was skeptical. This same query runs fine in production. It is not new so what changed? I tried running my sqlstat.sql script to get a history of the execution of SQL_ID azrtxpx1g6mxy but there was no history on our test system going back before July 2nd. I ran it on production and there was no history at all. I found out that this SQL had not run for two months on production and since we only keep 6 weeks history in our AWR any execution history of azrtxpx1g6mxy had fallen off the list.

Finally, I had a moment of insight. If I could have seen a faster plan on production, no insight would have been required. It would have been an almost mechanical process of comparing the good and bad plans. But I finally got the idea of running my same optimizer statistics scripts against production and looking at the indexes. Sure enough, production had an index on the joining column. Duh! I probably should have checked for differences between the tables and indexes in test and prod first, but I had no reason to suspect a difference. So, a coworker put the indexes back in test and everything was fine.

So, it is a simple thing. Indexes were dropped on a test system. Happens all the time. Not profound. But maybe the process of driving from problem statement through the different kinds of evidence and tools to the solution is more profound. It gets results. What does the AWR report point to? The top segments pointed to a table. The counts and column statistics suggested the need for an index. Finally, a compare of prod and test showed that an index was missing.

Bobby

Posted in Uncategorized | 4 Comments

Fixed issue with DBA_HIST_SQLSTAT script

I got rid of the old sqlstat.sql, sqlstat2.sql and sqlstat3.sql scripts in my GitHub repository and fixed an issue that was in sqlstat3.sql. Now the current version is just called sqlstat.sql. This one script should work well in all the situations where the other three were useful. (One script to rule them all…)

The old version of sqlstat3.sql was showing rows with plans that were not being run so I added a condition to only show plans that actually had any elapsed time. Slowly getting more useful.

Old output:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA
------------- --------------- ------------------------- ----------------
g0wqzdhxy62mc      2903317857 14-APR-20 01.00.04.320 PM                0
g0wqzdhxy62mc      4109593498 14-APR-20 01.00.04.320 PM              440
g0wqzdhxy62mc      1569680226 14-APR-20 01.00.04.320 PM                0
g0wqzdhxy62mc      2845678604 14-APR-20 01.00.04.320 PM                0

New output:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA
------------- --------------- ------------------------- ----------------
g0wqzdhxy62mc      4109593498 14-APR-20 01.00.04.320 PM              440

I really was not using the other two scripts anymore and was only using sqlstat3.sql. They are in the GitHub history if anyone needs to revisit them. Going forward I will use the new sqlstat.sql.

Bobby

Posted in Uncategorized | 2 Comments

How to email from Python

I got an email asking how to send an alert email from a Python script. Here is an example based on a script I have setup:

You need to replace “PUT YOUR SMTP SERVER HERE” with the host name of your smtp server.

Bobby

Posted in Uncategorized | Leave a comment

January 2020 Patches Not In Recommended Patch Advisor

Every quarter I download the latest patch set from Oracle’s support site by using the Recommended Patch Advisor. But the January 2020 patch sets did not show up. The RPA still recommends the October 2019 patch sets. The Recommended Patch Advisor is one of the options under the Patches and Updates tab in Oracle’s support site. I tried searching Oracle’s support site and Googling but did not find any answers. I am sure that the information is out there somewhere but the way I was searching for it did not find it. So, I thought I would post a quick note about it. The short answer is that you should use this Oracle support document to find the latest patch sets and not the Recommended Patch Advisor:

Master Note for Database Proactive Patch Program (Doc ID 756671.1)

To find this out I had to open an SR with Oracle Support. They quickly identified this document which explained the situation:

My Oracle Support: Patch and Health Recommendations No Longer Supported (Doc ID 2644769.1)

Note 2644769.1 seems to say that the Recommended Patch Advisor will continue to refer to the October 2019 patches until Oracle removes the RPA from their support site in May.

The funny thing is that I knew we had an internal document that a coworker and I had both contributed to that describes how to download the latest patch set and apply it to a database to create a tar or zip that we could use to deploy a new database. I decided to modify that document to refer to note 756671.1 and remove references to the RPA. To my surprise my coworker had already included note 756671.1 months ago in our instructions and not the RPA. I think that Oracle support must have recommended the note some time ago, but I was still using the advisor. I guess you learn something new every day.

Anyway, note to self: don’t use the Recommended Patch Advisor to find the quarterly patch updates. Use note 756671.1.

Bobby

Posted in Uncategorized | Leave a comment

Python Script to Scan Listener Log For Incoming Connections

Wrote a quick and dirty script to scan the listener log for incoming connections and produce a count of how many connections came from which host. Here is the URL:

https://github.com/bobbydurrett/miscpython/blob/master/gethostcounts.py

All it does is hack out the ip address from the second HOST= entry of the lines in the listener log that have HOST= in them. Then it uses the Python function socket.gethostbyaddr to get the hostname for those that can be looked up this way.

Counts the number of lines for a given ip address and sorts them from most to least connections. Might be useful to someone.

Bobby

Posted in Uncategorized | 4 Comments

Bounce Listener and Database After Adding oracle User to Group

I keep forgetting that I need to bounce both the listener and the database after changing the oracle user’s group membership, so I am writing a quick note to myself here. This is the second time I have been tripped up by something like this and I spent way too much time figuring it out again.

We have some PL/SQL code that writes to files on a database server. The most recent issue was using DBMS_XSLPROCESSOR.CLOB2FILE to write a clob to a file in a certain directory. This got the following error:

ORA-22288: file or LOB operation FILEOPEN failed
Permission denied

I tried writing to the same directory using UTL_FILE.FOPEN with write permission to create a new file. I got this error:

ORA-29283: invalid file operation: cannot open file [29435]

I got a Linux admin to add the oracle Linux user to the correct group and I bounced the database and the error did not go away. Then I bounced the listener and the database, and the errors were resolved.

NOTE TO SELF: if you add the oracle user to a new group to allow PL/SQL to write to a directory using group permissions bounce the database and the listener!

Bobby

P.S. This is on Oracle 19c on Redhat Linux 7 but should apply to a wide range of Oracle versions and Unix like platforms.

P.P.S. Bounce the listener first.

Posted in Uncategorized | Leave a comment

Plan Change Monitor goes off on SQL Analyze query

Last night, the plan change monitor script we have setup on one of our production databases alerted us about query with a SQL Analyze hint. It looked weird.

The query has sql_id 48tfz5d37xk28 and starts like this:

/* SQL Analyze(3141,1) */ select ...

The weird thing is the output of my sqlstat.sql script for this sql_id value:

It looks like one of the database’s built in processes was running this query with various plans. Plan_hash_value 3992115155 tripped the alert. I guess it was 10 times as long as plan 3597180003. The plan change monitor script could be improved in many ways to eliminate false positives like this, but I just added a condition to exclude all queries run by user SYS:

258c258,259
< m.sql_id not in ('dkqs29nsj23jq','58r2wgh5xx1bg','3m2su959r6tp8','3z60x9386n1v8','bx81s3m9zxutb','czwbdwxx5aqsk','001wbydnwaa7n','6quugay8yt73u')
---
> m.sql_id not in 
('dkqs29nsj23jq','58r2wgh5xx1bg','3m2su959r6tp8','3z60x9386n1v8','bx81s3m9zxutb','czwbdwxx5aqsk','001wbydnwaa7n','6quugay8yt73u') 
and s.username <> 'SYS'

This is the first time I remember seeing something like this cause an alert, but the workaround was easy.

Bobby

Posted in Uncategorized | Leave a comment

AWR records top 30 SQLs by default

I forget that Oracle’s AWR only records the top 30 SQL statements in each snapshot by default. I am not sure how long this link will last but here is a 19c manual page describing the default: 19c manual – see the topnsql setting. A lot of my query tuning assumes that the problem query is in the AWR but for very efficient queries on active systems they may mysteriously disappear or be absent from the AWR. It sometimes takes me a while to remember that the snapshots only include a fixed number of SQLs.

I use my sqlstat.sql query to look at a history of a particular sql_id’s executions. Often it shows the query running faster on one plan_hash_value than another. Then I look at why the sql_id changed plans. But what about when the good plan does not show up at all? Several times I have looked at sqlstat.sql output and thought that a query had not run in the past with an efficient query even though it had. It had run so efficiently that it was not on the report, so it looked like the query was a new, slow, SQL statement.

Often I will fix a query’s plan with a SQL Profile and rerun sqlstat.sql on a busy system after manually running dbms_workload_repository.create_snapshot to capture the most recent activity and the problem query with the new plan will not show up. Usually I remember that it is not in the top 30 queries and that is why it is missing but sometimes I forget. Here is a partial sqlstat.sql output showing a long running SQL disappearing after I fixed its plan on Wednesday:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Ave ms
------------- --------------- --------------------- ---------------- ------------------
acn0557p77na2      3049654342 12-FEB-20 05.00.01 AM                1          16733.256
acn0557p77na2      3049654342 12-FEB-20 06.00.03 AM                2           49694.32
acn0557p77na2      3049654342 12-FEB-20 07.00.53 AM                6          47694.527
acn0557p77na2      3049654342 12-FEB-20 08.00.54 AM               11         50732.0651
acn0557p77na2      3049654342 12-FEB-20 09.00.33 AM               15         53416.5183
acn0557p77na2      3049654342 12-FEB-20 10.00.43 AM               21         86904.4385
acn0557p77na2      3049654342 12-FEB-20 11.00.02 AM               27          84249.859
acn0557p77na2      3049654342 12-FEB-20 12.00.20 PM               27         125287.757
acn0557p77na2      3049654342 12-FEB-20 01.00.36 PM               69         156138.176

Sometimes I query the V$ tables to verify it is currently running a good plan. Here is example output from vsqlarea.sql showing the good plan running today.

LAST_ACTIVE         SQL_ID        PLAN_HASH_VALUE Avg Elapsed ms
------------------- ------------- --------------- --------------
2020-02-14 16:11:40 acn0557p77na2       867392646             14

This is just a quick note to me as much as anyone else. A query that is missing from an AWR report or my sqlstat.sql report may not have run at all, or it may have run so well that it is not a top 30 query.

Bobby

Posted in Uncategorized | 2 Comments

DBMS_UTILITY.FORMAT_CALL_STACK Change in 12.2 and later

Quick note. During my 11.2.0.4 to 19c upgrade that I have been writing about we found a difference in behavior of DBMS_UTILITY.FORMAT_CALL_STACK. I tested it on several versions, and it switched in 12.2. Now it puts the procedure name within the package in the stack.

Old output:

----- PL/SQL CALL STACK -----
  OBJECT      LINE  OBJECT
  HANDLE    NUMBER  NAME
0X15BFA6930         9  PACKAGE BODY MYUSER.MYPKG
0X10C988058         1  ANONYMOUS BLOCK

New output:

----- PL/SQL CALL STACK -----
  OBJECT      LINE  OBJECT
  HANDLE    NUMBER  NAME
0XA796DF28         9  PACKAGE BODY MYUSER.MYPKG.MYPROC
0X7ADFEEB8         1  ANONYMOUS BLOCK

Test code:

select * from v$version;

CREATE OR REPLACE PACKAGE MYPKG
AS

PROCEDURE MYPROC;

END MYPKG;
/
SHOW ERRORS

CREATE OR REPLACE PACKAGE BODY MYPKG
AS

PROCEDURE MYPROC
IS

BEGIN

DBMS_OUTPUT.PUT_LINE(UPPER(dbms_utility.format_call_stack));

END MYPROC;

END MYPKG;
/
SHOW ERRORS;

execute mypkg.myproc;
show errors;

Might be useful to someone else. We had some code that depended on the package name being the last thing on its line, but the new version includes the name of the procedure after the package name.

Bobby

Posted in Uncategorized | 2 Comments

Python 3.8.1 Linux Install Without Root

I wanted to install the latest Python on Linux in a low powered user’s home directory and not have the install interfere with the Python that comes with Linux. I wanted my own local copy of Python that I could install Python packages to without interfering with anything else. I was willing to use root to install the needed Linux packages, but everything else was done with my low powered user.

I did a yum update on my Oracle Enterprise Linux 7.7 install before I started to get everything on the latest version. As root I used yum to install the Linux packages I needed to install Python 3.8.1 from source:

I must have already installed these on this VM. I did the rest of these steps as my low powered user “bobby”. The next thing I did was download the latest Python source, the 3.8.1 version. I had to use the -k option to get around certificate issues:

[bobby@pythonvm ~]$ curl -k -O https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22.8M  100 22.8M    0     0  7830k      0  0:00:02  0:00:02 --:--:-- 7828k

For some reason I had to do this twice. The first download failed. Next I gunzipped and untarred this archive under my home directory:

tar zxfv Python-3.8.1.tgz

Python-3.8.1/Objects/sliceobject.c
Python-3.8.1/Objects/listobject.c
Python-3.8.1/Objects/typeslots.inc
Python-3.8.1/Objects/weakrefobject.c
Python-3.8.1/Objects/unicodeobject.c
Python-3.8.1/Objects/complexobject.c
Python-3.8.1/Objects/picklebufobject.c
Python-3.8.1/Objects/odictobject.c
Python-3.8.1/Objects/genobject.c
[bobby@pythonvm ~]$

I created a directory called “python” under my home directory to use as the top-level directory for my Python install. This directory tree will hold the binaries and any Python packages that I install.

[bobby@pythonvm ~]$ mkdir python

After changing directory to where I untarred the source files I configured the Python make and install to use the directory I just created as a “prefix”:


[bobby@pythonvm ~]$ cd Python-3.8.1

[bobby@pythonvm Python-3.8.1]$ pwd
/home/bobby/Python-3.8.1

./configure --prefix=/home/bobby/python

config.status: creating Misc/python-embed.pc
config.status: creating Misc/python-config.sh
config.status: creating Modules/ld_so_aix
config.status: creating pyconfig.h
creating Modules/Setup.local
creating Makefile


If you want a release build with all stable optimizations active (PGO, etc),
please run ./configure --enable-optimizations

Then I did the make and make altinstall from the same directory:

So, now Python 3.8.1 is installed in /home/bobby/python but I want to put the bin directory in the path so I can run python or pip and by default have them be this version. The install created python and pip as python3.8 and pip3.8 but I created links to them so that I could access them without 3.8 at the end of their names:

[bobby@pythonvm Python-3.8.1]$ cd /home/bobby/python/bin
[bobby@pythonvm bin]$ ls -al
total 16704
drwxr-xr-x. 2 bobby bobby     4096 Feb 10 15:36 .
drwxrwxr-x. 6 bobby bobby       52 Feb 10 15:36 ..
-rwxrwxr-x. 1 bobby bobby      109 Feb 10 15:36 2to3-3.8
-rwxrwxr-x. 1 bobby bobby      249 Feb 10 15:36 easy_install-3.8
-rwxrwxr-x. 1 bobby bobby      107 Feb 10 15:36 idle3.8
-rwxrwxr-x. 1 bobby bobby      231 Feb 10 15:36 pip3.8
-rwxrwxr-x. 1 bobby bobby       92 Feb 10 15:36 pydoc3.8
-rwxr-xr-x. 1 bobby bobby 17075312 Feb 10 15:35 python3.8
-rwxr-xr-x. 1 bobby bobby     3095 Feb 10 15:36 python3.8-config

[bobby@pythonvm bin]$ ln -s python3.8 python
[bobby@pythonvm bin]$ ln -s pip3.8 pip

Then I added/home/bobby/python/bin to the front of the path to keep these new python and pip links ahead of the python links or binaries in the rest of the path.

[bobby@pythonvm bin]$ cd
[bobby@pythonvm ~]$ vi .bashrc

export ORACLE_HOME=/home/oracle/app/oracle/product/12.1.0/client_1
export PATH=$PATH:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=/home/oracle/app/oracle/product/12.1.0/client_1/lib
export PATH=/home/bobby/python/bin:$PATH

~
~
~
~
~
~
".bashrc" 17L, 451C written     

I logged out and in as bobby and tried python and pip to see that they were the correct version:

[bobby@pythonvm ~]$ python
Python 3.8.1 (default, Feb 10 2020, 15:33:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39.0.3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
[bobby@pythonvm ~]$ pip list
Package    Version
---------- -------
pip        19.2.3
setuptools 41.2.0
WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[bobby@pythonvm ~]$   

Finally, I updated pip just to show that we can update something in this new install. Then I installed numpy to see if I could install a new package.

This is a quick and dirty post, but I want to remember these steps for myself and it might be helpful for someone who uses Redhat/Centos/OEL.

Bobby

Posted in Uncategorized | 6 Comments