General troubleshooting lessons from recent Delphix issue

Delphix support helped me resolve an issue yesterday and the experience gave me the idea of writing this post about several general computer issue troubleshooting tips that I have learned down through the years. Never mind that I ignored these lessons during this particular problem. This is more of a “do as I say” and not a “do as I do” story.  Actually, some times I remember these lessons. I didn’t do so well this week. But the several mistakes that I made resolving this recent Delphix issue motivate me to write this post and if nothing else remind myself of the lessons I’ve learned in the past about how to resolve a computer problem.

Don’t panic!

I’m reminded of the friendly advice on the cover of the Hitchhiker’s Guide to the Galaxy: “Don’t panic!”. So, yesterday it was 4:30 pm. I had rebooted the Delphix virtual machine and then in a panic had the Unix team reboot the HP Unix target server. But, still I could not bring up any of the Delphix VDBs.  We had people coming over to our house for dinner that night and I was starting to worry that I would be working on this issue all night. I ended up getting out of the office by 5:30 pm and had a great dinner with friends. What was I so stressed about? Even the times that I have been up all night it didn’t kill me. Usually the all night issues lead to me learning things anyway.

Trust support

The primary mistake that I made was to get my mind fixed on a solution to the problem instead of working with Delphix support and trusting them to guide us to the solution. We had a number of system issues due to a recent network issue and I got my mind set on the idea that my Delphix issue was due to some network hangup. I feel sorry for our network team because it seems like the first thought people have any time there is some issue is that it is a “network issue”. I should know better. How many times have I been working on issues when everyone says it is a “database issue” and I’m annoyed because I know that the issue is somewhere else and they are not believing me when I point to things outside the database. Anyway, I opened a case with Delphix on Monday when I couldn’t get a VDB to come down. It just hung for 5 minutes until it gave me an error. I assumed that it was a network hangup and got fixated on rebooting the Delphix VM. Ack! Ultimately, I ended up working with two helpful and capable people in Delphix support and they resolved the issue which was not what I thought at all. There are times to disagree with support and push for your own solution but I did this too early in this case and I was dead wrong.

Keep it simple

I’ve heard people refer to Occam’s razor which I translate in computer terms to mean “look for simple problems first”. Instead of fixing my mind on some vague network issue where the hardware is not working properly, how about assuming that all the hardware and software is working normally and then thinking about what problems might cause my symptoms? I can’t remember how many times this has bit me. There is almost always some simple explanation.  In this case I had made a change to a Unix shell script that runs when someone logs in as the oracle user. This caused Delphix to no longer be able to do anything with the VDBs on that server. Oops! It was a simple blunder, no big deal. But I’m kicking myself for not first thinking about a simple problem like a script change instead of focusing on something more exotic.

What changed?

I found myself saying the same dumb thing that I’ve heard people say to me all the time: nothing changed. In this case I said something like “this has worked fine for 3 years now and nothing has changed”. The long-suffering and patient Delphix support folks never called me on this, but I was dead wrong. Something had to have changed for something that was working to stop working. I should have spent time looking at the various parts of our Delphix setup to see if anything had changed before I contacted support. All I had to do was see the timestamp on our login script and I would see that something had recently changed.

Understand how it all works

I think my Delphix skills are a little rusty. We just started a new expansion project to add new database sources to Delphix. It has been a couple of years since I’ve done any heavy configuration and trouble shooting. But I used to have a better feel for how all the pieces fit together. I should have thought about what must have gone on behind the scenes when I asked Delphix to stop a VDB and it hung for 5 minutes. What steps was it doing? Where in the process could the breakdown be occurring? Delphix support did follow this type of reasoning to find the issue. They manually tried some of the steps that the Delphix software would do automatically until they found the problem. If I stopped to think about the pieces of the process I could have done the same. This has been a powerful approach to solving problems all through my career. I think about resolving PeopleSoft issues. It just helps to understand how things work. For example, if you understand how the PeopleSoft login process works you can debug login issues by checking each step of the process for possible issues. The same is true for Oracle logins from clients. In general, the more you understand all the pieces of a computer system, down to the transistors on the chips, the better chance you have of visualizing where the problem might be.

Well, I can’t think of any other pearls of wisdom from this experience but I thought I would write these down while it was on my mind. Plus, I go on call Monday morning so I need to keep these in mind as I resolve any upcoming issues. Thanks to Delphix support for their good work on this issue.

Posted in Uncategorized | Leave a comment

Cloning Oracle Home on fully patched 11.31 HP-UX hangs

I based this blog post on information that I learned from this Oracle Support document:

Runinstaller And Emctl Do Not Work After Upgrading HP-UX 11.31 To 11.31 Update3 (Sep 2008) (Doc ID 780102.1)

My situation was slightly different from what Oracle’s note describes so I thought it would be helpful to document what I found.

In my case I am cloning an Oracle home on to a fully patched HP-UX 11.31 server. I have used this same clone process on Oracle 11.1, 11.2, and 12.1 Oracle Homes with no issues. The symptom is that the 10.2 clone process just hangs with no helpful messages.

I searched Oracle’s support site and the web for issues with 10.2 cloning and could not find anything that matched my symptoms. I then decided to give up on cloning and try to install the base binaries and then patch to match the home that I was trying to clone. The install also hung.  But, I know that the install works since we have used it on many other similar systems. But, they were on HP-UX 11.23 and not the fully patched HP-UX 11.31. So, I searched for installer issues on 11.31 and 10.2 and found the Oracle document listed above.

Evidently there is some bug with the JDK that Oracle included in 10.2 so that it does not work with HP-UX 11.31 with the current patches. A later version of Java resolves the issue.

Now that I understood the issue I decided to go back to the clone and try to apply the recommendations from the Oracle note, even though it doesn’t mention cloning.

The Oracle note suggests adding the –jreLoc /opt/java1.4 option to the runInstaller command line. The only catch is that my HP system did not have /opt/java1.4.  The oldest java we have installed is in /opt/java1.5. So, I tried the clone with the -jreLoc /opt/java1.5 option and it got almost to the end of the clone before it hung doing some emctl step.  Then I realized that I needed to follow the steps in the Oracle note to rename the Oracle Home’s jdk directory and set up a link to the Java1.5 directory.  So, I did these steps to point to the correct jdk directory:


mv jdk jdk.orig

ln -s /opt/java1.5 jdk

Then I ran the clone with this command line:



It wasn’t that hard to apply the note to the clone situation but I thought it was worth blogging it. If someone googles runInstaller clone hang 10.2 11.31 and needs the solution they will find it.

Of course, I may be the only person in the world cloning a 10.2 Oracle Home on an HP-UX Itanium 11.31 system, but it’s here if someone needs it.


Posted in Uncategorized | Leave a comment

Another python graph – one wait event

Here is another graph that I created in Python with Pyplot:


This is on my github repository. has the plotting code. has the query. has the database access code.

I blanked out the database name in the example graph to hide it.

This is a graphical version of my onewaitevent.sql script. It queries the AWR looking at a particular wait event per hour. You look at the number of wait events in an hour to see how busy the system was and then the average elapsed time for that hour. Also, you set the smallest number of waits to include so you can drop hours where nothing is going on.

In the example graph you can find times where the average time for a db file sequential read is high and the system is busy. You use the top graph to see how busy the system is and the bottom to see where the average time spikes.

Still just an experiment but I thought I would pass it along. It isn’t that hard to create the graph in Python and I seem to have a lot of flexibility since I’m writing code instead of using an existing program like Excel.



Posted in Uncategorized | 1 Comment

Trying Python and Pyplot for Database Performance Graphs

In the past I have used Excel to graph things related to Oracle database performance. I am trying out Python and the Pyplot library as an alternative to Excel.  I took a graph that I had done in Excel and rewrote it in Python. The graph shows the CPU usage within the database by category.  For example, I labeled the database CPU used by a group of web servers “WEBFARM1” on the graph.

Here is an example graph:


You can find most of this code in the Python section of my GitHub repository. Here is the code that I used to create the example graph above using some made up data: zip

To make this graph in Excel I was running a sqlplus script and cutting and pasting the output into a text file that I imported into Excel. Very manual. No doubt there are ways that I could have automated what I was doing in Excel. But I have studied Python as part of the edX classes I took so I thought I would give it a try.

Python let me write a program to run the graph from an icon on my desktop. I used the cx_Oracle package to pull the data from the database and Pyplot for the graph.

I’m running the Windows 32 bit version of Canopy Express for my Python development environment. This environment comes with Pylot so I just had to install cx_Oracle to have all the packages I needed to make the graph.

I think both Excel and Python/Pyplot still have value. Excel still seems easier for quick and dirty graphing. But I used Python to automate a report that I run every day with fewer manual steps.  Probably could have done the same thing in Excel but I have recently studied Python so I was able to apply what I learned in my classes without a lot more effort.




Posted in Uncategorized | 2 Comments

Github Repository

I am experimenting with Github. I have created a repository for my Oracle database related scripts. Here is my Github URL:

You can clone this repository locally if you have git installed using this command:

git clone

I’ve had challenges before when I write a blog post about a script and then revise the script later.  It seems weird to update the post with links to the new version.  So, I’m thinking of using github to expose the scripts that I want to share with the Oracle community and then I can update them over time and the version history will be visible.

Let me know if you have any questions or suggestions.  This is just my first attempt at using Github for this purpose.


Posted in Uncategorized | Leave a comment

Update hinted for wrong index

I worked with our support team to improve the performance of a PeopleSoft Financials update statement yesterday. The update statement had an index hint already in it but the index was not the best one of the available indexes.

Here is the original update statement:

 BANK_CD = :2,

I listed out the columns for the indexes on the table using the “querytuning” part of my standard sql tuning scripts.

Here are the columns for the hinted index:


The where clause includes only the first two columns.

But another similar index, PSEPYMNT_VCHR_XREF, exists with these columns:


The where clause has all three of these columns. So, why was the original query hinted this way? Does the E index not work better than the C index? I ran this query to see how selective the condition PYMNT_SELCT_STATUS = ‘N’ is.

>select PYMNT_SELCT_STATUS,count(*)
 4 AND B.REMIT_VENDOR = '12345678'

- ----------
C 5
N 979
P 177343
X 5485

I included the conditions on the first two columns that both indexes share, but removed the other conditions from the original update. A count on the number of rows that meet the conditions of only these two columns shows how many rows the original index will have to use to check the remaining where clause conditions.

I grouped by PYMNT_SELCT_STATUS to see how many rows met the condition PYMNT_SELCT_STATUS = ‘N’ and how many did not. Grouping on PYMNT_SELCT_STATUS shows how many rows the new index will use to check the remaining conditions in the where clause. I ran this query to see if the second index would use fewer rows than the first.

This query showed that only 979 of the over 180,000 rows met the condition. This made me think that the E index which includes PYMNT_SELCT_STATUS has a good chance of speeding up the original update. I ran a count with a hint forcing the C index and then again forcing the E index:

>set timing on
>select /*+ INDEX(B PSCPYMNT_VCHR_XREF) */ count(*)
 4 AND B.REMIT_VENDOR = '12345678'
 6 AND B.PYMNT_ID = ' '


Elapsed: 00:13:52.53
>select /*+ INDEX(B PSEPYMNT_VCHR_XREF) */ count(*)
 4 AND B.REMIT_VENDOR = '12345678'
 6 AND B.PYMNT_ID = ' '


Elapsed: 00:00:01.28

The original hint caused the select count(*) query to run in 13 minutes while the new hint caused it to run in 1 second. Clearly the new E index causes the query to run faster!

The developer that I was working with found the problem update statement in some PeopleCode and was able to edit the hint forcing it to use the better index. We migrated the modified code to production and the user was able to run the update statement without the web site timing out. Prior to the change the user was not able to complete the update because the SQL statement took so long it exceeded our application server timeout.



Posted in Uncategorized | Leave a comment

Tested 1000 Select Statements on New Exadata X5

I finished testing 1000 select statements on our new Exadata X5 to see if they would run faster or slower than on our older Exadata V2.  Our current production V2 has 12 nodes and the new X5 has only 2.  The memory and parallel server parameters on the X5 are 6 times are large as on the old one, since we have one sixth as many hosts and more than 6 times the memory and CPU per host. I think that memory parameters can sometimes change execution plans, and of course with the newer Exadata software who knows what other differences we might see.  I wanted to see if any plan changes or other issues caused some queries to run much slower on our newer Exadata system than the old one. I picked 1000 select statements at random from our current production and tested them comparing plans and execution time. In the end I did not find any bad plan changes and on average the tested select statements ran about 4 times faster on the X5 than on the older V2.

I used my testselect package that I have mentioned in several other posts. Here are some other examples of using this package for performance tuning:

In the other posts I was using the package to test the effect of some change on query plans and performance.  So, I was comparing two different situations on the same host. But, in this case I was comparing two different hosts with essentially the same data and settings. But they had different versions of Exadata hardware and larger parameters and fewer nodes on the newer host.  Here are the results of my first run with all 1000 statements.  I got the execution plan for all 1000 select statements but only executed the ones with different plans.  Here were the results:

>execute TEST_SELECT.display_results('X5','V2');
Select statements that ran 3 times faster with X5 than with V2.
        --------- -------------------- -------------------- --------------------- ---------------------
                3            287237826            287237826                     3                    34
                4           1245040971           1245040971                     1                    11
                9             36705296           2770058206                     4                    22

... edited out most of the lines for brevity ...

              997           2423577330           2423577330                     0                     9
              998           2217180459           3921538090                     1                    13
             1000           3842377551           1690646521                     2                    12
Number of selects=329
Select statements that ran 3 times faster with V2 than with X5.
        --------- -------------------- -------------------- --------------------- ---------------------
               95           3919277442           3919277442                     0                     2
              210           3508255766           3508255766                     0                     2
              282           3946849555           3085057493                     0                     6
              347           3278587008            789099618                    19                   170
              375            581067860            460184496                     0                     3
              429            534521834            534521834                     1                     6
              569           3953904703           3484839332                     0                     2
              681            946688683           3451337204                     1                     6
              697            908111030           2971368043                     0                     1
              699           3756954097           1915145267                     0                     1
              706           1121196591           1121196591                     0                     2
              708            581067860            460184496                     0                     4
              797            908111030           2841065272                     0                     5
              950            786005624           2571241212                    45                   460
              966           3151548044           3151548044                     1                     5
Number of selects=15
Summary of test results
        -------------------- ------------------------ ---------------- --------------------------
                          X5 5545.9999999999999999999              486                         11
                          V2                    21138              486                         43

Of the tested statements 329 ran 3 or more times faster on the X5.  But 15 selects ran 3 or more times faster on the old V2.  So, I needed to test the 15 selects again on both servers.

I’m not sure if it was smart or not, but I decided to run all the selects 5 times in a row to maximize caching.  The X5 is new and not in use so there wouldn’t be any activity to stimulate caching.  My test script for the X5 looked like this:

truncate table test_results;

execute TEST_SELECT.execute_all('X5');
execute TEST_SELECT.execute_all('X5');
execute TEST_SELECT.execute_all('X5');
execute TEST_SELECT.execute_all('X5');
execute TEST_SELECT.execute_all('X5');
execute TEST_SELECT.reexecute_errored('X5');
execute TEST_SELECT.reexecute_errored('X5');
execute TEST_SELECT.reexecute_errored('X5');
execute TEST_SELECT.reexecute_errored('X5');
execute TEST_SELECT.reexecute_errored('X5');

After we made sure that the system had cached everything, all 15 selects ran, on average, 4 times faster on the X5 than the V2:

-- ---------- ------------- ----------------- ----------------- ------------ ------------------ ------------------------ --------------- ------------- ------------------ -------------- ----------------------------------------------------------------
X5         95 54a8k0yhbgyfq                          3919277442            1                  0                       12            2583            14                  0              1
V2         95 54a8k0yhbgyfq                          3919277442            1                  1                       15            2583            14                  1              1
V2        210 b132ygmp743h4                          3508255766            0                  2                       19            1592            14                  0              1
X5        210 b132ygmp743h4                          3508255766            0                  2                        8            1430            14                  0              1
V2        282 aw5f12xsa8c2h                          3946849555            0                  0                       14            3468            14                  0              2
X5        282 aw5f12xsa8c2h                          3946849555            0                  0                        8            3322            14                  2              2
V2        347 8ncbyjttnq0sk                          3278587008            1                  3                      462         1203794            14                  0          61838
X5        347 8ncbyjttnq0sk                          3278587008            1                  2                      206         1126539            14                  4          51849
X5        375 4yq5jkmz2khv5                           581067860            0                  0                        9           14530            14                  0              2
V2        375 4yq5jkmz2khv5                           581067860            0                  0                       19           14686            14                  1              2
V2        429 49pyzgr4swm4p                           534521834            0                  2                       11            1814            14                  0              0
X5        429 49pyzgr4swm4p                           534521834            0                  0                        5            1638            14                  1              0
X5        569 3afmdkmzx6fw8                           630418386          694                  0                       74           70173            14                  3              0
V2        569 3afmdkmzx6fw8                          3527323087          694                  1                       73           68349            14                  0           3588
V2        681 dyufm9tukaqbz                           668513927            0                  0                       10            6298            14                  0              2
X5        681 dyufm9tukaqbz                          3317934314            0                  0                        8            6096            14                  0              2
V2        697 1fqc3xkzw8bhk                           908111030            0                  0                        3            1406            14                  0              1
X5        697 1fqc3xkzw8bhk                           908111030            0                  0                        2            1406            14                  0              1
V2        699 03qk2cjgr4q2k                          1915145267           31                  0                      476           95922            14                  1              0
X5        699 03qk2cjgr4q2k                          1915145267           31                  0                      272           96299            14                  0              0
V2        706 28fnjtdhjqwrg                          1121196591            0                  0                       21            1355            14                  0              4
X5        706 28fnjtdhjqwrg                          1121196591            0                  0                       13            1355            14                  0              4
V2        708 2yrkwqs46nju0                           581067860            0                  0                       14           14684            14                  0              0
X5        708 2yrkwqs46nju0                           581067860            0                  0                        9           14528            14                  0              0
V2        797 dc5481yn8pm85                           908111030            0                  0                        3            1407            14                  0              2
X5        797 dc5481yn8pm85                           908111030            0                  0                        2            1407            14                  0              2
V2        950 by6n1m74j82rt                           786005624            6                  7                     2087          249736            14                  1         245443
X5        950 by6n1m74j82rt                          2571241212            6                  0                      186           90897            14                  0              3
X5        966 5c2n74gfrxwxx                          3151548044           12                  0                       24          116360            14                  9          84949
V2        966 5c2n74gfrxwxx                          3151548044           12                  0                       52          119701            14                  1          88002

The summary of the results:

Select statements that ran 3 times faster with X5 than with V2.
	--------- -------------------- -------------------- --------------------- ---------------------
	       95           3919277442           3919277442                     0                     1
	      429            534521834            534521834                     0                     2
	      569            630418386           3527323087                     0                     1
	      950           2571241212            786005624                     0                     7
Number of selects=4
Select statements that ran 3 times faster with V2 than with X5.
	--------- -------------------- -------------------- --------------------- ---------------------
Number of selects=0
Summary of test results
	-------------------- ------------------------ ---------------- --------------------------
	                  X5                        4               15                          0
	                  V2                       16               15                          1

I guess it is no surprise that the X5 is faster than the five-year older V2.  But, I thought it was a good example of how to use my testselect package to do see how a set of queries will run in two different situations.


Posted in Uncategorized | Leave a comment

OakTable video of myself and others

You can find the full length video of my Delphix talk that I did at OakTable World on Tuesday here: url

Also, the OakTable folks have updated the OakTable World agenda page with video of all the talks. This has lots of good material and for free. Scroll down to the bottom of the page to find the links to the videos.


Posted in Uncategorized | Leave a comment

Final day – OpenWorld and Delphix Sync

This morning was my last day of Oracle OpenWorld sessions and this afternoon and evening finished off my day with Delphix Sync.

The first talk was my only NoSQL talk. It was interesting because the claim was that NoSQL was good for large numbers of simple transactions. This seems to be a theme across a couple of sessions. The funny thing is that the NoSQL code reminded me of my pre-SQL mainframe Datacom DB database programming days. You specified the table and the index and fetched rows etc. You are the optimizer! Of course, you can do the same with simple one table queries in SQL. But, Oracle’s NoSQL may have some concurrency modes that Oracle’s main RDBMS doesn’t have for what that’s worth. The fun thing was that they had examples using Python and I’ve taken Python on Edx so I could read the code. Also, they talked about the REST API and I had done a few REST commands with JSON working through a demo of the Oracle database cloud a few weeks back. So, there were synergies with things I already know.

Next I went to this packed session by someone from Tumbler describing their approach to sharding and scaling. The two packed sessions I went to this week were both MySQL sessions and both by internet companies – Tumbler and Ticketmaster. They were in kind of small rooms and it was a little warm and stuffy. But, I found both very interesting. Supporting large web apps is a pretty cool proposition. Something that in another life would be fun to work on.

Next I went to a PeopleSoft session. I’ve done PeopleSoft for 20 years and I’m bored with it but I figure I should keep up with the latest. It was actually more of a functional presentation on modules that I have never used so most of the information was of no use to me. But, the new Fluid User Interface that I had never seen before interested me so I stayed long enough to get a feel for it. It seems that Oracle built it for tablets and maybe smart phones.

Next it was off to the hip (or should I say hipster :)) Hotel Zetta for Delphix Sync. It was a very cool event with a fun venue and lots of good snacks. No dinner for me tonight. I got a chance to do a ten minutes lightning talk that I built from three slides from Tuesday’s presentation. I got positive feedback but I felt kind of intimidated by all the Delphix techies. There were a lot of Delphix leaders and developers present as well as a number of people from larger customers. I heard a great talk on Delphix performance and other customers and Delphix employees spoke as well. I learned a lot and it makes me think I need to delve back into our Delphix environment and give it a thorough check out.

So, my OpenWorld/Delphix Sync week is over and I am beat. Like always these conferences leave me with information overload. I’m back to the prioritization thing that always dogs my step. There is just too much to learn and do. Where do I put my time? We shall see.


Posted in Uncategorized | 1 Comment

Wednesday OpenWorld

Well, it was a long day but it ended in a fun way.

Today I was back to the normal OpenWorld sessions starting with the general session. It was eye-opening because the speakers described a new CPU chip that they were using in their latest servers.  It had some custom elements to support database processing.  It was strange because I have recently been studying the latest Intel x86 documentation and it was interesting to compare Intel’s chips with the latest Sun/Oracle chip.  I had read about the specialized SIMD instructions in the x86 family that Intel uses to speed up graphics. So, I was not surprised that Oracle is including more complex additions to their new chip with specialized instructions.  Still, I question whether people are really going to buy anything but Intel x86 at this point due to the price/performance.

Next I went to a session describing the way a company used a tool called Chef to manage their Weblogic deployments.  The session topic interested me because we also use Weblogic at US Foods.  But it was a little hard to follow.  Maybe it would have helped if I had been exposed to Chef before hearing the talk. Still, it was good to know that there are tools out there to help automate deployment of new systems.

Next I caught a PeopleSoft in the Cloud talk. It seems that you will install the latest version of PeopleTools in a very different way than you did in the past.  I got the feeling that this was just a part way step toward fully setting up PeopleSoft to run in the cloud.

Then I went to a really cool talk about how Ticketmaster sells 20,000 tickets in one minute. It was about their MySQL architecture.  They have a large farm of MySQL servers supporting the queries behind their ticketing web site.  But, they use Oracle for the updates related to ticket purchases.

Then I went to a talk on Oracle ZFS.  I get the feeling that I need to learn more about ZFS. It seems that ZFS is an ancestor of Delphix and I know that there is a free OpenZFS that I might play with. I think that Tim Gorman, who works for Delphix, mentioned something about OpenZFS  at his Ted talk at Oak Table World Tuesday so there may be some relationship.

Lastly I went to a talk about how you can use Oracle’s Enterprise Manager to support both on site and cloud databases.  It sounds good but I think it still need to mature over time to support the cloud systems more fully.

Then at 5:30 pm I went to a fun bloggers party sponsored by Pythian and the Oracle Technology Network (OTN).  I’m not a big party person but I had a good time.  It was easy to strike up a conversation with people since we had a lot in common.

Anyway, enough for today.  One more day Thursday and then my brain will overflow. :)


Posted in Uncategorized | Leave a comment