2013-01-25

row cache objects

Contention on 'row cache objects' Latch After Upgrade from 10.2 to 11.2.0.3 [ID 1485410.1]

Applies to:

Oracle Server - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.

Symptoms

  • AWR Report shows contention for 'row cache objects' as the top wait:
    Top 5 Timed Foreground Events
    
    Event                    Waits      Time(s) Avg wait (ms)  DB time Wait Class
    latch: row cache objects 58,291,873 130,700             2    63.18 Concurrency
    DB CPU                       60,563   29.27  
    db file sequential read   2,994,299  14,461             5     6.99 User I/O
    library cache lock            3,179   1,662           523     0.80 Concurrency
    gc cr grant 2-way         1,233,503     981             1     0.47 Cluster
    
  • After the database upgrade to 11.2.0.3, there is dramatic increase in the waits for the 'row cache objects' latch. Looking at the 'Latch Miss Sources' section in the AWR Report, the top location calling 'row cache objects' is 'kqrpre: find obj':
     Latch Name        Where          NoWait Misses Sleeps     Waiter Sleeps
    ...
     row cache objects kqrpre: find obj                  0 10,332,326     8,906,124
     row cache objects kqreqd: reget                     0  7,888,165     8,503,625
     row cache objects kqreqd                            0  7,677,074     8,489,832
     row cache objects kqrso                             0      4,127         3,622
     row cache objects kqrpre: init complete             0        143            88
    

    The 'row cache objects' latch is called most frequently from 'kqrpre: find obj'. This module tries to find details of an object being parsed in the row cache. During parse, the row cache is searched. The process searches through a linked list protected by the 'row cache objects' latch to find the object in the cache that it needs. When other processes are also parsing and looking through the row cache this may cause contention; especially if the parse activity is excessive or unnecessary.
  • From the system state trace, a common stack can bee seen for the related processes:
    ()+492<-kslges()+2152<-kqreqd()+256<-kkoiqb()+9436<-kkooqb()+684<-kkoqbc()+2256<-apakkoqb()+140
    <-apaqbdDescendents()+460<-apadrv()+2020<-opitca()+2112
     <-kksFullTypeCheck()+72<-rpiswu2()

    Most of the time in the stack appears to be spent in the function KKOIQB. This function makes row cache calls in order to find index names so that indexes for a table can be sorted by alphabetical name order. In the case of a cost tie between different plans the optimizer needs a consistent method of choosing one index over another. The index names is sorted alphabetically to achieve this; in other words, it picks the first index alphabetically in the event of a choice between two plans using indexes with equal costs.

Changes

 The database was upgraded from 10.2.0.4 to 11.2.0.3 .

Cause

The cause is due to architectural changes between 10.2 and 11.2 in how Oracle does optimization. There are lot more query transformation in 11.2.0.3 than in 10.2.0.4 for queries with views or sub-queries. This results in more passes through optimization, which means more row cache latch calls.
(Furthermore, in this particular case the parameter optimizer_mode = first_rows_100 was being used increasing the number of optimization passes still further, since In 11.2, with first_rows optimization, the optimizer goes through more permutations at hard parse time to get the most optimal execution plan possible. In addition, this particular environment is doing 350 hard parses per second because the application did not use bind variables.)

Solution

 There are actually several solutions/workarounds.  The list is in order of preference from Support's perspective.
1) Rewrite the application to use bind variables to lower the hard parse rate from 350/second
2) Enable the cursor_sharing = force
3) Do not use first_rows_x optimization and use all_rows instead
4) Set optimizer_feature_enabled = 10.2.0.4
5) Set event 10089 level 1 to disable index sorting.
Here are some example metrics on the batch job in question before and after some of the changes:
 
10g
55 threads No parameters set, optimizer_mode=FIRST_ROWS_100, <---Throughput is 11K. 'row cache objects' latch contention observed.

11g
60 threads No parameters set, optimizer_mode=FIRST_ROWS_100, <---Throughput is 8400. 'row cache objects' latch contention observed.
60 threads optimizer_mode=ALL_ROWS.                          <---Throughput is 13.5K. 10% less 'row cache objects' latch contention than the two examples above.
60 threads with event 10089 set.                             <---Throughput is 18304. Not any major contention.
65 threads with event 10089 set.                             <---Throughput is 21000. Not any major contention. 


Niciun comentariu:

Trimiteți un comentariu