Library cache lock scripts for RAC

I’ve been having issues for a long time now with an Exadata RAC database that has user reports experiencing library cache lock waits.  The challenge is to figure out what is holding the library cache locks that the queries are waiting on.

My starting point on library cache locks has always been this Oracle support document:

How to Find which Session is Holding a Particular Library Cache Lock (Doc ID 122793.1)

But it doesn’t tell you how to find the session across nodes of a RAC database.

I also found this helpful blog post that briefly addresses finding the session across RAC nodes: Library cache lock and library cache pin waits

I’ve spent many hours over more than a year now dealing with these waits without a lot of success so I finally tried to build a script that I could run regularly to try to capture information about the sessions holding the library cache locks.

First, I knew from Oracle’s document that the x$kgllk table could be used to find the blocking session on a single node so I included queries against this table in my script and set it up so I would ssh into every node of the cluster and run a queries like this against each node:

-- sessions on this instance that are waiting on
-- library cache lock waits
-- unioned with
-- sessions on this instance that are holding locks that other
-- sessions on this instance are waiting on.

insert into myuser1.library_cache_lock_waits
(
SOURCETABLE,
BLOCKER,
SAMPLE_TIME,
INST_ID,
SID,
USERNAME,
STATUS,
OSUSER,
MACHINE,
PROGRAM,
LOGON_TIME,
LAST_CALL_ET,
KGLLKHDL,
KGLLKREQ,
USER_NAME,
KGLNAOBJ,
sql_id,
SQL_FULLTEXT
)
(select
'X\$KGLLK',
'N',
sysdate,
(select INSTANCE_NUMBER from v\$instance),
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
x.KGLLKHDL,
x.KGLLKREQ,
x.USER_NAME,
x.KGLNAOBJ,
s.sql_id,
q.SQL_FULLTEXT
from 
v\$session s, x\$kgllk x, v\$sql q
where
x.kgllkses=s.saddr and
s.sql_id=q.sql_id(+) and
s.event='library cache lock' and
x.KGLLKREQ > 0 and
q.child_number(+)=0)
union all
(select
'X\$KGLLK',
'Y',
sysdate,
(select INSTANCE_NUMBER from v\$instance),
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
x.KGLLKHDL,
x.KGLLKREQ,
x.USER_NAME,
x.KGLNAOBJ,
s.sql_id,
q.SQL_FULLTEXT
from 
v\$session s, x\$kgllk x, v\$sql q,
x\$kgllk x2
where
x.kgllkses=s.saddr and
s.sql_id=q.sql_id(+) and
x.KGLLKREQ = 0 and
x2.KGLLKREQ > 0 and
x2.KGLLKHDL = x.KGLLKHDL and
q.child_number(+)=0);

commit;

The dollar signs are escaped with a backslash because these queries are part of a Unix shell script.  I picked a few columns that I thought would be helpful from v$session and joined to v$sql to get the text of the blocking and blocked SQL.  Note that I ran these queries as SYSDBA.  Here is an example of my test case where the blocker and blocked sessions are both on one node:

SOURCETABLE                    B Sample Time            INST_ID        SID USERNAME                       STATUS   OSUSER                         MACHINE                                                          PROGRAM                                          Logon Time          LAST_CALL_ET KGLLKHDL           KGLLKREQ USER_NAME                      KGLNAOBJ                                                     SQL_ID        RESOURCE_NAME1                 RESOURCE_NAME2                 SQL_FULLTEXT
------------------------------ - ------------------- ---------- ---------- ------------------------------ -------- ------------------------------ ---------------------------------------------------------------- ------------------------------------------------ ------------------- ------------ ---------------- ---------- ------------------------------ ------------------------------------------------------------ ------------- ------------------------------ ------------------------------ --------------------------------------
X$KGLLK                        N 2014-02-17 17:19:01          1       1183 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:18:57            5 00000005F9E7D148          2 MYUSER1                        TEST                                                         g4b4j3a8mms0z                                                               select sum(b) from test
X$KGLLK                        Y 2014-02-17 17:19:03          1        995 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:18:52           10 00000005F9E7D148          0 MYUSER1                        TEST                                                         gv7dyp7zvspqg                                                               alter table test modify (a char(100))

Next, I noticed that on gv$session when a session was waiting on library cache lock waits sometimes FINAL_BLOCKING_INSTANCE and FINAL_BLOCKING_SESSION were populated and that might lead me to the session holding the lock.  Also, this query and the ones following can run in a less privileged account – you don’t need SYSDBA.

drop table lcl_blockers;

create table lcl_blockers as
select distinct
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id
from
gv\$session s, 
gv\$session s2
where
s2.FINAL_BLOCKING_INSTANCE=s.INST_ID and
s2.FINAL_BLOCKING_SESSION=s.SID and
s2.event='library cache lock';

insert into myuser1.library_cache_lock_waits
(
SOURCETABLE,
BLOCKER,
SAMPLE_TIME,
INST_ID,
SID,
USERNAME,
STATUS,
OSUSER,
MACHINE,
PROGRAM,
LOGON_TIME,
LAST_CALL_ET,
sql_id,
SQL_FULLTEXT
)
select
'GV\$SESSION',
'Y',
sysdate,
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
q.SQL_FULLTEXT
from 
lcl_blockers s, gv\$sql q
where
s.sql_id=q.sql_id(+) and
s.INST_ID=q.INST_ID(+) and
q.child_number(+)=0
order by s.INST_ID,s.sid;

commit;

When this works – sporadically in my tests – it shows the same sort of information the previous queries do for same node locking.  Here is an example of these gv$session queries catching the blocker:

SOURCETABLE                    B Sample Time            INST_ID        SID USERNAME                       STATUS   OSUSER                         MACHINE                                                          PROGRAM                                          Logon Time          LAST_CALL_ET KGLLKHDL           KGLLKREQ USER_NAME                      KGLNAOBJ                                                     SQL_ID        RESOURCE_NAME1                 RESOURCE_NAME2                 SQL_FULLTEXT
------------------------------ - ------------------- ---------- ---------- ------------------------------ -------- ------------------------------ ---------------------------------------------------------------- ------------------------------------------------ ------------------- ------------ ---------------- ---------- ------------------------------ ------------------------------------------------------------ ------------- ------------------------------ ------------------------------ --------------------------------------
GV$SESSION                     Y 2014-02-17 17:19:05          1        995 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:18:52           12                                                                                                                         gv7dyp7zvspqg                                                               alter table test modify (a char(100))

Lastly, I got a cross node query working that uses the view gv$ges_blocking_enqueue.  The key to making this query was that the pid column in gv$ges_blocking_enqueue is the same as the spid column in gv$process.

-- join gv$ges_blocking_enqueue, gv$session, gv$process to show 
-- cross node library cache lock blockers.  Blocked session will 
-- have event=library cache lock.

drop table ges_blocked_blocker;

create table ges_blocked_blocker as
(select distinct
'N' blocker,
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
s.process,
p.spid,
e.RESOURCE_NAME1,
e.RESOURCE_NAME2
from
gv\$session s, 
gv\$process p,
gv\$ges_blocking_enqueue e
where
s.event='library cache lock' and 
s.inst_id=p.inst_id and
s.paddr=p.addr and
p.inst_id=e.inst_id and
p.spid=e.pid and
e.blocked > 0)
union
(select distinct
'Y',
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
s.process,
p.spid,
e.RESOURCE_NAME1,
e.RESOURCE_NAME2
from
gv\$session s, 
gv\$process p,
ges_blocked b,
gv\$ges_blocking_enqueue e
where
s.inst_id=p.inst_id and
s.paddr=p.addr and
p.inst_id=e.inst_id and
p.spid=e.pid and
b.RESOURCE_NAME1=e.RESOURCE_NAME1 and
b.RESOURCE_NAME2=e.RESOURCE_NAME2 and
e.blocker > 0);

insert into myuser1.library_cache_lock_waits
(
SOURCETABLE,
BLOCKER,
SAMPLE_TIME,
INST_ID,
SID,
USERNAME,
STATUS,
OSUSER,
MACHINE,
PROGRAM,
LOGON_TIME,
LAST_CALL_ET,
sql_id,
SQL_FULLTEXT,
RESOURCE_NAME1,
RESOURCE_NAME2
)
select
'GV\$GES_BLOCKING_ENQUEUE',
s.blocker,
sysdate,
s.INST_ID,
s.SID,
s.USERNAME,
s.STATUS,
s.OSUSER,
s.MACHINE,
s.PROGRAM,
s.LOGON_TIME,
s.LAST_CALL_ET,
s.sql_id,
q.SQL_FULLTEXT,
s.RESOURCE_NAME1,
s.RESOURCE_NAME2
from 
ges_blocked_blocker s, gv\$sql q
where
s.sql_id=q.sql_id(+) and
s.INST_ID=q.INST_ID(+) and
q.child_number(+)=0
order by s.INST_ID,s.sid;

commit;

Here is some example output from my gv$ges_blocking_enqueue script.  I edited my username, machine name, etc. to obscure these.

SOURCETABLE                    B Sample Time            INST_ID        SID USERNAME                       STATUS   OSUSER                         MACHINE                                                          PROGRAM                                          Logon Time          LAST_CALL_ET KGLLKHDL           KGLLKREQ USER_NAME                      KGLNAOBJ                                                     SQL_ID        RESOURCE_NAME1                 RESOURCE_NAME2                 SQL_FULLTEXT
------------------------------ - ------------------- ---------- ---------- ------------------------------ -------- ------------------------------ ---------------------------------------------------------------- ------------------------------------------------ ------------------- ------------ ---------------- ---------- ------------------------------ ------------------------------------------------------------ ------------- ------------------------------ ------------------------------ --------------------------------------
GV$GES_BLOCKING_ENQUEUE        N 2014-02-17 17:19:55          2        301 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:19:46            7                                                                                                                         g4b4j3a8mms0z [0x426d0373][0x224f1299],[LB][ 1114440563,575607449,LB        select sum(b) from test
GV$GES_BLOCKING_ENQUEUE        Y 2014-02-17 17:19:55          1        497 MYUSER1                        ACTIVE   myuser2                        MYMACHINE                                                        sqlplus.exe                                      2014-02-17 17:19:41           13                                                                                                                         gv7dyp7zvspqg [0x426d0373][0x224f1299],[LB][ 1114440563,575607449,LB        alter table test modify (a char(100))

The alter table command on node 1 is holding the lock while the select statement on node 2 is waiting on the library cache lock.

So, I've got this going on a script that runs every 15 minutes in production.  It worked great in my test case but time will tell if it yields any useful information for our real problems.

- Bobby

p.s. I've uploaded a zip of my scripts: zip

Here is a description of the included files:

Testcase to create a library cache lock:

create.sql - creates a table with one character first column CHAR(1)
alter.sql - alters table expanding CHAR column
query.sql - queries table - waits on library cache lock wait if run while alter.sql is running

all.sh - top level script - you will need to edit to have the host names for your RAC cluster and to have your own userid and password

lcl.sh - x$ table script that is run on each node.  Only key thing is that our profile required a 1 to be entered to choose the first database from a list.  You may not need that line.

resultstable.sql - create table to save results

dumpresults.sql - dump out all results

dumpresultsnosql.sql - dump out all results except sql text so easier to read.

Here is the definition of the results table:

create table myuser1.library_cache_lock_waits
(
 SOURCETABLE    VARCHAR2(30),
 BLOCKER        VARCHAR2(1),
 SAMPLE_TIME    DATE,
 INST_ID        NUMBER,
 SID            NUMBER,
 USERNAME       VARCHAR2(30),
 STATUS         VARCHAR2(8),
 OSUSER         VARCHAR2(30),
 MACHINE        VARCHAR2(64),
 PROGRAM        VARCHAR2(48),
 LOGON_TIME     DATE,
 LAST_CALL_ET   NUMBER,
 KGLLKHDL       RAW(8),
 KGLLKREQ       NUMBER,
 USER_NAME      VARCHAR2(30),
 KGLNAOBJ       VARCHAR2(60),
 SQL_ID         VARCHAR2(13),
 RESOURCE_NAME1 VARCHAR2(30),
 RESOURCE_NAME2 VARCHAR2(30),
 SQL_FULLTEXT   CLOB
);

P.P.S. This was all tested only on Exadata running 11.2.0.2.

Oracle documentation on Library Cache:

The library cache is a shared pool memory structure that stores executable SQL and PL/SQL code. This cache contains the shared SQL and PL/SQL areas and control structures such as locks and library cache handles. In a shared server architecture, the library cache also contains private SQL areas.

Oracle 12c Concepts manual diagram with library cache

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I've been working as an Oracle database administrator and PeopleSoft administrator since 1994. I'm very interested in Oracle performance tuning.
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Library cache lock scripts for RAC

  1. darylerwin says:

    LIbrary cache locks – in my case – was a special case – “rapid” invalid login attempts cause a library cache locks that you simply cant find. We found that our OEM Premier Support monitoring done by oracle was using the wrong password and slamming the db with these locks. Only after stopping that/correcting the password were we able to rid ourselves of this..

    • Bobby says:

      Thanks for your comment. In our case I don’t really know the root cause, but it may have to do with our VPD policies or the way statistics are gathered on our tables which have thousands of subpartitions. The jury is still out, but I spent a lot of time trying to build some monitoring scripts so hopefully it will be helpful to others to have what I’ve done documented.

Leave a Reply