One of my favorite children’s books when my children were younger was one about the Berenstain Bears where Father Bear tries to teach his son how to ride a bike and keeps making all these dumb mistakes. “Let that be a lesson to you” he would say after every goof.
Well, this post is in the same vein. I was working on an Exadata system trying to get a state dump to find out the source of the high library cache lock waits we have been seeing. We have a standard statedump/hanganalyze script for Exadata that looks something like this:
oradebug setmypid oradebug unlimit oradebug -g all dump hanganalyze 3 oradebug -g all dump systemstate 258 host sleep 90 oradebug -g all dump hanganalyze 3 oradebug -g all dump systemstate 258 oradebug tracefile_name host sleep 90 oradebug -g all dump hanganalyze 3 oradebug -g all dump systemstate 258 oradebug tracefile_name exit
We got something like this from Oracle support. However the oradebug tracefile_name command doesn’t really show you where the state dumps are coming out. It turns out that when you use the -g all option to do a state dump on every node of an Exadata system the state is dumped by the diag background process. So, we probably should take out the tracefile_name commands. But, the alert log has the name of the diag trace file that the state dump goes to so I had no problem finding it.
But, then I got the bright idea of removing (with the Linux rm command) the diag trace file. It already had 400 megabytes worth of old dumps in it. Now, in my defense, normally rm’ing a trace file is no big deal. But the diag trace file is always open. So, after I removed the trace file and did another statedump I couldn’t find the output anywhere. Apparently the diag process keeps the file open and all I did with the rm command was remove it from the directory!
Fortunately Oracle support has a document for people such as myself who rm background trace files: “How to recreate background trace file(s) that may have been accidentally deleted [ID 394891.1]“ All you have to do is get the Unix process id of the diag process and plug it into this script:
oradebug setospid 4804 oradebug close_trace oradebug flush exit
In this example the Unix process id of diag was 4804. You can do this command to double check that 4804 really is the diag process:
ps -ef | grep 4804
You should see a process in the format ora_diag_INSTANCENAME.
So, in the future when getting a state dump on Exadata I’m just going to do this command to clear out the old dumps:
cat /dev/null > INSTANCENAME_diag_4804.trc
Once I’ve cleared the diag trace file then I’ll run our standard statedump/hanganalyze script.
Anyway, I learned something today! Hope this is helpful to others.