WebIV:View NOTE:62161.1

Article-ID:         <Note:62161.1>
Circulation:        PUBLISHED (EXTERNAL)
Folder:             server.Rdbms.Performance.Database
Topic:              ** Tuning Oracle7 and Oracle8
Platform:           GENERIC  Generic issue
Subject:            Systemwide Tuning using UTLESTAT Reports in Oracle7/8
Author:             RPOWELL.UK
Creation-Date:      27-JUL-1998 14:26:12
Modified-Date:      09-FEB-1999 13:51:10
Document-Type:      BULLETIN
Impact:             MEDIUM
Skill-Level:        ACCOMPLISHED
Component:          RDBMS  07.01 - 08.00  Server 07.01 - 08.00
Content-Type:       TEXT/X-HTML
Attachments:        NONE

UTLBSTAT / UTLESTAT (commonly known as bstat/estat reports)

There are several articles about bstat/estat reports already in circulation. Here we will give an overview of what a bstat/estat is and then focus on the important parts. It is important that you have read <Note:61998.1> before continuing.

BSTAT / ESTAT

The bstat/estat scripts are designed to provide a summary of Oracle activity over a reasonable period of time (>10 minutes). UTLBSTAT creates a snapshot of some of the internal Oracle dynamic performance views (V$ views). UTLESTAT creates a second snapshot of these views and reports on the differences between the two snapshots to a file called 'report.txt'.

Some key things to remember about bstat/estat:


UTLESTAT Output

The REPORT.TXT output file is quite large. Rather than aimlessly describing all the values and what they mean we first give an overview of the sections of the output and then describe where to start. Each section is then covered in isolation with comments as to when you can skip over it.
Note: The format of the REPORT.TXT file varies slightly between releases so
      sections may not be in exactly the same order as shown here. Also
      not all sections are reported in all Oracle versions.

<A> SVRMGR> Rem Select Library cache statistics. ... LIBRARY GETS GETHITRATI PINS ... The library cache is where Oracle stores object definitions, SQL statements etc.. Each namespace (or library) contains different types of object. The figures here give a quick summary of the usage of each namespace. The most useful indicator here is the RELOADS column.
<B> Statistic Total Per Transact ... There are hundreds of statistics gathered by Oracle. Only a few of these are needed for general tuning. In particular we get information about CPU usage, buffer cache hit ratio, table scanning and sorts from this section.
<C> Average Write Queue Length The average write queue length is a good indication of how busy the buffer cache is.
<D> SVRMGR> Rem System wide wait events for non-background processes ... Event Name Count Total Time Avg Time This is arguably the most important section in the report as it shows how long Oracle is waiting for resources. This will be the starting point for looking at tuning Oracle.
<E> SVRMGR> Rem System wide wait events for background processes ... Event Name Count Total Time Avg Time In earlier Oracle7 releases this section and section <D> are combined. In Oracle 7.3 and Oracle8 this section shows time spent waiting by background processes.
<F> SVRMGR> Rem Latch statistics. ... LATCH_NAME GETS MISSES HIT_RATIO SLEEPS ... Latches are points of serialisation within Oracle so are critical to good performance. This section is important if there are waits for latches occurring.
<G> LATCH_NAME NOWAIT_GETS NOWAIT_MISSES NOWAIT_HIT_RATIO This section shows no-wait latch misses - the cost of a "miss" is difficult to assess - it is likely to be less significant than a "sleep" in the previous section.
<H> SVRMGR> Rem Buffer busy wait statistics. CLASS COUNT TIME This section indicates any waiting for block access and is only really important if there is evidence of delays being introduced due to block contention.
<I> SVRMGR> Rem Waits_for_trans_tbl high implies you should add rollback ... UNDO_SEGMENT TRANS_TBL_GETS TRANS_TBL_WAITS ... This sections indicates rollback segment activity.
<J> SVRMGR> Rem The init.ora parameters currently in effect: NAME VALUE A list of non-default init.ora parameter settings.
<K> NAME GET_REQS GET_MISS SCAN_REQ SCAN_MIS MOD_REQS ... This indications how well the dictionary cache is performing. This is self managed by Oracle so this section can generally be ignored. This section was important in Oracle6.
<L> SVRMGR> Rem Sum IO operations over tablespaces. TABLE_SPACE READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES Both this and the next section indicate where read and write IO is occurring, and show which tablespaces are servicing full table scans.
<M> SVRMGR> Rem I/O should be spread evenly across drives. TABLE_SPACE FILE_NAME READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES Per file read and write IO.
<N> SVRMGR> Rem The times that bstat and estat were run. START_TIME END_TIME ------------------------- ------------------------- 14-may-98 10:30:22 14-may-98 11:30:06 It is important to know the duration of any report. Longer reports tend to average out "peaks" and may hide problem situations.
<O> BANNER ---------------------------------------------------------------- Oracle7 Server Release 7.3.3.5.0 - Production Release Which version is being used - only the first 3 digits should be treated as reliable.

UTLESTAT - What to look at

1. Skim over the report and look for negative figures. Usually these mean the database was stopped / started between the bstat and estat runs. Prior to Oracle 7.2 it was common for a few of the larger figures to "wrap" to give negative results (especially the 'memory' statistics). This is less likely to occur in Oracle 7.2 onwards.

2. From section <B> note down the CPU used by this session statistic. This is the cumulative amount of CPU that all sessions have used in the sample period - the name is rather misleading ! Also note down the parse time cpu figure. ('parse time cpu' is included within 'CPU used by this session'). These statistics will be used elsewhere for comparisons.

3. Sections <D> Wait Events and <E> Background Wait Events
In 7.1 and 7.2 there is only one wait-event list for all processes. This, along with your CPU figures from '2', gives you the information on where most time is going. Whenever an Oracle process waits for something it records it as a 'wait' using one of a set of predefined waits (see V$EVENT_NAME for a list of all wait events). Some of these events can be considered as 'idle' events - ie: the process is waiting for work. Others indicate time spent waiting for a resource or action to complete. By comparing the relative time spent waiting on each wait event and the CPU used by this session (from 2 above) we can see where the Oracle instance is spending most of its time.

In order to get an indication of where time is going:

  1. - Ignore any 'idle' wait events. Common IDLE wait events include:
         client message               } Process waiting for data from the
         SQL*Net message from client  } client application
         SQL*Net more data from client}
         rdbms ipc message            - Usually background process waiting for work
         pipe get                     - DBMS_PIPE read waiting for data
         Null event                   - Miscellaneous
         pmon timer                   - PMON waiting for work
         smon timer                   - SMON waiting for work
         parallel query dequeue       - Waiting for input (Discussed later)
    
  2. - Ignore any wait events which represent a very small percentage of the total time waited.
  3. - Add up the remaining wait event times and show each one as a percentage of total time waited.
  4. - Compare the total time waited with the CPU used by this session figure.

This summary gives a quick indication of where most of the time is going. Eg: if 50% of the time is spent waiting for 'enqueue' there is probably (but not necessarily) a serious locking conflict in the application. If CPU used is above 25% of the total time waited the application may be CPU heavy (there are no hard rules as to what is a high value for CPU used). The top wait events and CPU figures help determine which other sections of the bstat/estat report are worth looking at.

WARNING: The CPU statistics are only updated by Oracle when a user call completes. This means that long running PL/SQL jobs which are CPU intensive will NOT show up in the CPU figures until the outermost PL/SQL block completes. This also means that long running statements that started before UTLBSTAT was run and complete during the sampled period will add their entire CPU usage to the CPU used by this session statistic potentially distorting the figures.

4. If everything in the report looks normal then look at the IO and CPU related sections of this article. If Oracle is behaving well for the workload presented to it there may still be scope to reduce the workload (tuning the SQL etc..)

The aim of the above steps is to identify areas that account for a significant portion of the overall time. Once a potential problem area has been identified one can then look at what options are available to improve the timings. In some cases there are suggestions within this text but you may be referred to separate articles.

>From this point onwards you can either look at the sections appropriate to you or read the entire article. The table below suggests where you should go next for specific scenarios - the wait event names are hyper-links to the relevant section. If using a hard copy of this article then the sections follow the table in the same order as presented here.

Using CPU
High CPU usage You need to know where the CPU is going to
Common Wait Events (alphabetical order)
buffer busy waits Wait to access a block buffer
db file scattered read Multi-block read
db file sequential read OS block read - usually single block read
enqueue Waiting for an enqueue (lock)
free buffer waits Waiting for a free buffer to use.
latch free Waiting for a latch
library cache pin Waiting for a PIN on a library cache object.
log buffer xxx Waiting for log buffer space events
log file switch xxx Waiting for log file switch events
log file sync Waiting on LGWR to sync redo to disk
SQL*Net message from dblink Waiting for response from remote DB
write complete waits Wait for a write of a block buffer to complete.
Special wait events
Parallel query dequeue PQ slave or QC is waiting for a message. Ignore this in calculations

If the top wait events are not listed in the table above and do not appear as idle wait-event list then see Systemwide Waits for Other Wait Events.


High CPU Usage

There can be many reasons that the CPU is a large portion of the overall time spent within Oracle. In a well tuned system the waits will have been reduced as far as possible and all time spent in Oracle uses CPU to accomplish the requested task. However, it is unlikely you would be reading this if that were the case so we need to consider what the CPU cycles are being used for in order to decide if Oracle is doing useful work or not.

Get the parse time cpu and CPU used by this session figures from the "Statistics" section of the estat report. If parse time cpu represents a large percentage of the CPU time then time is being spent parsing rather than executing statements. If this is the case then it is likely that the application is using literal SQL and not sharing it or the shared pool is very badly configured. See <Note:62143.1> for information on the shared pool and the impact of literal SQL.

If the parse CPU is only a small percentage of the total CPU used then the next task is to determine where the CPU is going. There are several things we can do to help with this:

  1. Find statements with large numbers of buffer gets as these are typically heavy on CPU
  2. Find which sessions are responsible for most CPU usage
  3. Trace typical user sessions using the SQL_TRACE option to see how CPU is apportioned amongst the main application statements. See <Note:62160.1>
We cover each of these below:

Locating CPU-Heavy SQL

The statement below finds SQL statements which access database buffers a lot. Such statements are typically heavy on CPU usage because they are probably looking at a lot of rows of data:
  SELECT address, hash_value, 
         buffer_gets, executions, buffer_gets/executions "Gets/Exec", 
         sql_text
    FROM v$sqlarea 
   WHERE buffer_gets > 50000 and executions>0
   ORDER BY 3
   ;
The cut-off value of 50000 is an arbitrary starting point and should be increased/decreased gradually until the top 10 - 20 statements are listed. This shows the SQL statements which have the most BUFFER_GETS which typically relates to using most CPU. The statements of interest are those with a large number of gets per execution especially if executions is high. One off statements with high buffer gets may be due to batch jobs so it helps if you have an understanding of the application components to know which statements are expected to be expensive.

NB: The above statement will not highlight CPU intensive PL/SQL blocks

Once candidate statements have been isolated the full statement text can be obtained using following query substituting relevant values for ADDRESS and HASH_VALUE pairs:

  SELECT sql_text 
    FROM v$sqltext
   WHERE address='&ADDRESS_WANTED'
     and hash_value=&HASH_VALUE
   ORDER BY piece
   ;
The statement can then be explain (using EXPLAIN PLAN) or isolated for further testing to see how CPU intensive it really is. Note that if the statement uses bind variables and your data is highly skewed then the statement may only be CPU intensive for certain bind values.

Locating CPU-Heavy Sessions

The statement below can help locate sessions which have used the most CPU:
  SELECT v.sid, substr(s.name,1,30) "Statistic", v.value
    FROM v$statname s , v$sesstat v
   WHERE s.name = 'CPU used by this session'
     and v.statistic#=s.statistic#
     and v.value>0
   ORDER BY 3
   ;
Note that the CPU time is cumulative so a session which has been connected for several days may appear to be heavier on CPU than one which has only been connected for a short period of time. Thus it is better to write a script to sample the CHANGE in the statistic between 2 known points in time allowing one to see how much CPU was used by each session in a known timeframe.

Once any CPU intensive sessions have been identified <View:V$SESSION> can be used to get more information. At this stage it is generally best to revert to user session tracing (SQL_TRACE) to determine where the CPU is being used. See <Note:62160.1>


Wait Events

The following few sections describe the meanings of the most common wait events from sections C and D of the Estat report. For each wait event we attempt to drill down to locate more information on what is causing the waits. The wait events are included in alphabetical order.

System wide waits for "buffer busy waits"

This wait happens when a session wants to access a database block in the buffer cache but it cannot as the buffer is "busy". The two main cases where this can occur are:
  1. Another session is reading the block into the buffer
  2. Another session holds the buffer in an incompatible mode to our request

If the TIME spent waiting for buffers is significant then we need to determine which segment/s is/are suffering from contention. The "Buffer busy wait statistics" section of the Bstat/estat shows which block type/s are seeing the most contention. This information is derived from the <View:V$WAITSTAT> which can be queried in isolation:

  SELECT time, count, class 
    FROM V$WAITSTAT 
   ORDER BY time,count
  ;
This shows the class of block with the most waits at the BOTTOM of the list. Additional information can be obtained from the internal view X$KCBFWAIT thus:
  SELECT count, file#, name
    FROM x$kcbfwait, v$datafile
   WHERE indx + 1 = file#
   ORDER BY count
  ;
This shows the file/s with the most waits (at the BOTTOM of the list) so by combining the two sets of information we know what block type/s in which file/s are causing waits. The segments in each file can be seen using a query like:
  SELECT distinct owner, segment_name, segment_type
    FROM dba_extents
   WHERE file_id= &FILE_ID 
  ;
If there are a large number of segments of the type listed then monitoring V$SESSION_WAIT may help isolate which object is causing the waits. Eg: Repeatedly run the following statement and collect the output. After a period of time sort the results to see which file & blocks are showing contention:
  SELECT p1 "File", p2 "Block", p3 "Reason"
    FROM v$session_wait 
   WHERE event='buffer busy waits'
  ;
Note:
In the above query there is no reference to WAIT_TIME as you are not interested in whether a session is currently waiting or not, just what buffers are causing waits.

Alternatively capturing session trace may help - See <Note:62160.1>. If a particular block or range of blocks keep showing waits you can try to isolate the object using:

  SELECT distinct owner, segment_name, segment_type
    FROM dba_extents
   WHERE file_id= &FILE_ID 
     and &BLOCK_NUMBER between block_id and block_id+blocks-1
  ;
Actions
As buffer busy waits are due to contention for particular blocks then you cannot take any action until you know which blocks are being competed for and why. Eliminating the cause of the contention is the best option. Note that "buffer busy waits" for data blocks are often due to several processes repeatedly reading the same blocks (eg: if lots of people scan the same index) - the first session processes the blocks that are in the buffer cache quickly but then a block has to be read from disk - the other sessions (scanning the same index) quickly 'catch up' and want the block which is currently being read from disk - they wait for the buffer as someone is already reading the block in.

The following hints may be useful for particular types of contention - these are things that MAY reduce contention for particular situations:
Block TypePossible Actions
data blocksEliminate HOT blocks from the application. Check for repeatedly scanned / unselective indexes. Change PCTFREE and/or PCTUSED. Check for 'right- hand-indexes' (indexes that get inserted into at the same point by many processes). Increase INITRANS. Reduce the number of rows per block.
segment headerIncrease of number of FREELISTs. Use FREELIST GROUPs (even in single instance this can make a difference).
freelist blocksAdd more FREELISTS. In case of Parallel Server make sure that each instance has its own FREELIST GROUP(s).
undo header Add more rollback segments.


System wide waits for "db file sequential read"

This wait happens when a session is waiting for an IO to complete. Typically this indicates single block reads, although reads from a disk sort area may use this wait event when reading several contiguous blocks. Remember IO is a normal activity so you are really interested in unnecessary or slow IO activity.

If the TIME spent waiting for IOs is significant then we can determine which segment/s Oracle has to go to disk for. See the "Tablespace IO" and "File IO" sections of the ESTAT report to get information on which tablespaces / files are servicing the most IO requests, and to get an indication of the speed of the IO subsystem.

Actions
Block reads are fairly inevitable so the aim should be to minimise un-necessary IO. This is best achieved by good application design and efficient execution plans. Changes to execution plans can yield orders of magnitude changes in performance. Tweaking at system level usually only achieves percentage gains.

A larger buffer cache may help - test this by actually increasing DB_BLOCK_BUFFERS and not by using DB_BLOCK_LRU_EXTENDED_STATISTICS. Never increase the SGA size if it may induce additional paging or swapping on the system.

A less obvious issue which can affect the IO rates is how well data is clustered physically. Eg: Assume that you frequently fetch rows from a table where a column is between two values via an index scan. If there are 100 rows in each index block then the two extremes are:

  1. Each of the table rows is in a different physical block (100 blocks need to be read for each index block)
  2. The table rows are all located in the few adjacent blocks (a handful of blocks need to be read for each index block)
Pre-sorting or re-organising data can help to tackle this in severe situations.

System wide waits for "db file scattered read"

This wait happens when a session is waiting for a multiblock IO to complete. This typically occurs during full table scans or index fast full scans. Oracle reads up to DB_FILE_MULTIBLOCK_READ_COUNT consecutive blocks at a time and scatters them into buffers in the buffer cache. How this is done depends on the value of USE_READV.

If the TIME spent waiting for multiblock reads is significant then we need to determine which segment/s we are performing the reads against. See the "Tablespace IO" and "File IO" sections of the ESTAT report to get information on which tablespaces / files are servicing multiblock reads (BLKS_READ/READS>1).

It is probably best at this stage to find which sessions are performing scans and trace them to see if the scans are expected or not. This statement can be used to see which sessions may be worth tracing:

  SELECT sid,total_waits, time_waited
    FROM v$session_event
   WHERE event='db file scattered read'
     and total_waits>0
   ORDER BY 3,2
  ;
Actions
Ideally you do not want to repeated perform full table scans in online portions of an application when there is a faster more selective way to get at the data. In non online portions of an application table scanning is much more likely to be required so the following points may help:

System wide waits for "enqueue"

A wait for an enqueue is a wait for a local 'lock'. The count and average wait times for this wait-event can be misleading as "enqueue" waits re-arm every few seconds. To qualify how many waits have really occurred you need the enqueue waits statistic from
the statistics section of the estat report.
Eg: Assume
        Wait Events show enqueue time_waited=3000 total_waits=10
        Statistics  show enqueue waits has a count of 2
    This means there were 2 actual waits whose individual wait times totalled 
    to 3000 hundredths of a second (ie: 30 seconds).
To determine which enqueues are causing the most waits system-wide look at <View:X$KSQST> thus:
  SELECT  ksqsttyp "Lock", 
          ksqstget "Gets",
          ksqstwat "Waits"
    FROM X$KSQST where KSQSTWAT>0;
This gives the system wide number of waits for each lock type. Remember that it only takes one long wait to distort the average wait time figures.
Actions
The action to take depends on the lock type which is causing the most problems. It is beyond the scope of this article to look at the reasons for waits on each lock type - the most common lock waits are generally for:
     TX      Transaction Lock 
                Generally due to application or table setup issues
                See <Note:62354.1> for example scenarios which can cause
                TX lock waits.

     TM      DML enqueue
                Generally due to application issues, particularly if 
                foreign key constraints have not been indexed.
                See <OLS:106754.289> for details of TM locks and 
                referential integrity.

     ST      Space management enqueue 
                Usually caused by too much space management occurring 
                (Eg: small extent sizes, lots of sorting etc..) 
                See <Note:33567.1> for more information about the ST
                enqueue.


System wide waits for "free buffer waits"

This wait happens when a session has to wait for a free buffer in the buffer cache. Eg: A buffer is needed for a session to read a database block from disk into the buffer cache. If the TIME spent waiting for free buffers is significant then it is likely that DBWR is not keeping up with the cache throughput rate.
Actions
See
<Note:62172.1> on issues affecting the buffer cache and DBWR.

System wide waits for "latch free"

Latches are like short duration locks that protect critical bits of code. As a latch wait is typically quite short it is possible to see a large number of latch waits which only account for a small percentage of time.

If the TIME spent waiting for latches is significant then we need to determine which latches are suffering from contention. The Bstat/estat section on latches shows latch activity in the period sampled. This section of the estat report is based on <View:V$LATCH> gives a summary of latch activity since instance startup and can give an indication of which latches are responsible for the most time spent waiting for "latch free" thus:

  SELECT latch#, name, gets, misses, sleeps
    FROM v$latch
   WHERE sleeps>0
   ORDER BY sleeps 
  ;

  Note that this select gives the worst latches at the BOTTOM of the list.
Some lines in this report are actually for multiple latches all of the same type. To determine if the latch activity is concentrated on one particular latch in the set one can query <View:V$LATCH_CHILDREN> (only available from 7.3 onwards)
Oracle 7.3 and Oracle 8.0
<View:V$LATCH_CHILDREN>
  SELECT addr, latch#, gets, misses, sleeps
    FROM v$latch_children
   WHERE sleeps>0
     and latch# = &LATCH_NUMBER_WANTED
   ORDER BY sleeps 
  ;
This gives the system wide number of waits for each child latch of the type LATCH#. If there are no rows returned then there is only a single latch of the type you are looking at.

If there are multiple rows the important thing to note is whether the SLEEPS are reasonably distributed or if there are one or two child latches responsible for 80% of the SLEEPS. If the contention is focused on one or two child latches make a note of which children are seeing a problem - note the ADDR column.

Actions
The latches most likely to show high sleeps are listed below along with some possible actions:
  shared pool latch
      In Oracle 7.1/7.2 there are known issues involving the shared
      pool and library cache latches. See <Note:32871.1> for details.

      Heavy use of literal SQL will stress this latch significantly.
      If your online application makes heavy use of literal SQL statements
      then converting these to use bind variables will give significant
      improvements in latch contention in this area.
      See <Note:62143.1> for issues affecting the shared pool.

      MTS places extra load on this latch - the use of the LARGE_POOL
      in Oracle8 may help this. 

  library cache pin (7.1) and
  library cache latches
      From Oracle 7.2 onwards the library cache latch has child latches .
      Problems with these latches are typically due to heavy use of literal 
      SQL or very poor shared pool configuration. 
      If your online application makes heavy use of literal SQL statements
      then converting these to use bind variables will give significant
      improvements in latch contention in this area.
      See <Note:62143.1> for issues affecting the shared pool.
      
        
  cache buffers lru chain latch
      Setting DB_BLOCK_LRU_STATISTICS to TRUE can adversely affect this 
      latch - always ensure DB_BLOCK_LRU_STATISTICS is set to FALSE.
      From Oracle 7.3 it is possible to have multiple of these latches 
      by specifying DB_BLOCK_LRU_LATCHES although this really needs multiple
      CPU's to be of most benefit.
      Heavy contention for this latch is generally due to heavy buffer cache 
      activity which can be caused, for example, by:

                Sorting in buffer cache and not using SORT_DIRECT_WRITES

                Repeatedly scanning large unselective indexes

      See <Note:62172.1>


  cache buffers chains latches
      Individual block contention can show up as contention for one of these
      latches.  Each cache buffers chains latch covers a list of buffers in 
      the buffer cache. If one or two child latches stand out from 
      V$LATCH_CHILDREN then:

                SELECT dbafil,dbablk,class, state 
                  FROM x$bh
                 WHERE hladdr='&ADDR_OF_CHILD_LATCH'
                ;

      If this list contains lots of entries for the same dbafil, dbablk
      in 7.1/7.2 then see <Note:38350.1> and <Bug:256032>.

      If this list is short (3 to 10 buffers) then one of the buffers in
      this list is probably very 'hot' - ie: suffers from lots of concurrent
      access attempts.

Additionally Oracle 7.3 introduces a new view which may be of help to Oracle Support in more obscure cases:
  SELECT sleep_count, "WHERE"
    FROM v$latch_misses
   WHERE parent_name='&PROBLEM_LATCH_NAME'
   ORDER BY 1
  ;
This shows circumstances where latches were requested but not obtained immediately.

System wide waits for "library cache pin"

Waits on the "library cache pin" wait event occur if a session wants to use an item in the shared pool but someone else has the item pinned in an incompatible mode. This is normally only seen if there is heavy contention for a particular object or statement in the shared pool.

Addressing this problem is likely to need the help of Oracle support but the following few statements should help obtain useful information.

Repeatedly run the following statement and collect the output. After a period of time sort the results to see if the waits are always for a single "handle" or for several different handles.

  SELECT p1 "Handle"
    FROM v$session_wait 
   WHERE event='library cache pin'
  ;
Also isolate any statements which have high version counts:
  SELECT address, to_char(hash_value,'9999999999999'), version_count, sql_text
    FROM v$sqlarea
   WHERE version_count>10
    ORDER BY version_count
  ;
Actions
If a particular statement has a high version count it is possible that there is a problem. Contact Oracle support if this wait event is showing as a significant chunk of the total time.

System wide waits for "log file space/switch" and "log buffer ..." events

There are several wait events which may indicate problems with the redo buffer and redo log throughput:
7.0 to 7.2:
              log buffer space/switch

7.3 onwards:
              log buffer space 
              log file switch (checkpoint incomplete) 
              log file switch (archiving needed) 
              log file switch/archive
              log file switch (clearing log file) 
              log file switch completion 
              switch logfile command 

Actions
For the "log buffer space" events:
  1. Increase the LOG_BUFFER size until there is no incremental benefit (sizes > 1Mb are unlikely to add any benefit)

For all other waits:

  1. Ensure redo logs are on fast disks (NOT RAID5)
  2. Ensure redo logs are large enough to give a sensible gap between log switches. A 'rule-of-thumb' is to have one log switch every 15 to 30 minutes.
  3. Ensure the ARCHiver process is running and keeping up.
For further possible actions see "Tuning LGWR".

System wide waits for "log file sync"

This wait happens when a commit (or rollback) is issued and the session has to wait for the redo entry to be flushed to disk to guarantee that instance failure will not roll back the transaction.

Actions
The 2 ways to reduce waits for this are:
  1. Where possible reduce the commit frequency. Eg: Commit at batch intervals and not per row.
  2. Speed up redo writing (Eg: Do NOT use RAID 5, use fast disks etc..)
See "Tuning LGWR" for other actions.

System wide waits for "SQL*Net message from dblink"

This wait event is used when we are waiting for a response from a DB LINK. The link could be to another Oracle database or to a Gateway. Time spent waiting on this wait-event is indicative of the percentage of time spent waiting for the remote database / gateway
Actions
Locate which remote databases / instances are used and gather performance information about those databases/gateways in addition to the local instance.

System wide waits for "write complete waits"

This wait happens when a requested buffer is being written to disk - we cannot use the buffer while it is being written. If the TIME spent waiting for buffers to be written is significant then note the
"Average Write Queue Length" - if this too is large then the the cache aging rate is too high for the speed that DBWR is writing buffers to disk.

Actions
This can be tackled from two angles:
  1. Decrease the cache aging rate
  2. Increase the throughput of DBWR
These are described in <Note:62172.1>


System wide waits for "Parallel Query Dequeue"

Parallel query dequeue waits are problematic in that the same wait event is used in both 'idle' and 'busy' situations. This article does not cover the details of parallel query so we assume parallel query dequeue waits are idle waits and ignore them in calculations. This is not a bad assumption as any query slave waiting on a message is dependent on the supplier/s of that message - the supplying process/es should either be working (using CPU) or waiting for some other wait event (Eg: IO) so the overview seen in the bstat/estat output should still give an indication of the source of any wasted time.

Actions
If your site makes significant use of parallel query for the periods being looked at then it is best to tune the queries independently. This is covered in other articles.

System wide waits for other wait events

There are a large number of wait events, many of which occur infrequently. If any other wait event represents a significant portion of the time waited then the first thing you need to know is whether the wait represents:
  1. Idle time - Ie: A process was just waiting for work
  2. A real wait which is hindering performance
Contact Oracle Support and/or use Metalink to help determine whether the wait is important and if so how to find more information about why and where the waits are occurring.
Actions
Have all information gathered to date available before contacting support.

More details on each BSTAT / ESTAT Section

The following notes give more detail on each section of the BSTAT/ESTAT report. The order of the sections here match the order give in the "UTLESTAT Output" summary earlier in this article.

<A> Library Cache Statistics

Based on <View:V$LIBRARY_CACHE>
The columns are:
   LIBRARY              Different object types are stored in different
                        'libraries'.
   GETS                 Number of times an item in this 'library' was 
                        requested.
   GETHITRATIO          Percentage of times the item was requested and found
                        to already be present in the cache.
   PINS                 Number if times an item was 'pinned'. An item has to be
                        'pinned' in order to be used. Items may be looked up
                        once and then pinned/unpinned many times.
   PINHITRATIO          Percentage of times an item was requested to be pinned
                        and was successfully pinned.
   RELOADS              Number of times an item had to be re-loaded because
                         part of it had been flushed from the cache and was 
                         needed.
   INVALIDATIONS        Number of times objects were invalidated in the sample
                         period.
If there are a significant number of RELOADS then re-usable information is being flushed from the SGA and thus having to be reloaded / rebuilt. See <Note:62143.1> for details about setting up the shared pool.

If the GetHitRatio or PinHitRatio is low (<.9) then it is likely that the application is using unsharable SQL or infrequently referencing objects. If there is a poor hit ratio see <Note:62143.1> Note that due to the way Oracle works it is unusual for the hit ratios in this section to very low.


<B> System Statistics

Based on <View:V$SYSSTAT>
This is a very long section full of lots of statistical information. The columns are:
   Statistic            Statistic name 
   Total                Change in this statistic between sample times
   Per Transact         "Total" divided by number of COMMITs in 
                         the sample period.
   Per Logon            "Total" divided by the number LOGONS that
                         occurred in the sample period.
   Per Second           "Total" divided by time interval between
                         the samples.
Most of the statistics can be ignored as they are only useful in specific circumstances. Below are some of the more commonly referenced statistics:
    CPU used by this session    Actually the CPU used across all sessions
                                 in the sample period
    consistent gets             Gets of a block in consistent mode
    db block gets               Gets of a block in current mode
    enqueue waits               Actual number of waits for a lock
    execute count               Number of execute calls
    parse count                 Number of parse calls
    parse count (total) V8        "             "
    parse count (hard) V8        resulting in a hard parse
    parse time cpu              CPU spent parsing
    physical reads              Number of blocks physically read . Note this 
                                 is blocks read and NOT the number of IO read 
                                 requests.
    recursive calls             Number of recursive calls (either for
                                 dictionary information or PLSQL)
                                        
    sorts (disk)                Number of disk based sorts
    table scans (long tables)   Number of full table scans
    user calls                  Number of user calls
There is no intention to describe all the statistics here especially as many have self describing names. Some of the statistics may be useful in particular situations so if the text refers to "statistic XXX" then this is the section of the report to find the value. A few notes on some of the more widely used statistics are included below.

WARNING: CPU statistics are only updated by Oracle when a user call completes. This means that long running PL/SQL jobs which are CPU intensive will NOT show up in the CPU figures until the outermost PL/SQL block completes.

The HIT RATIO

The hit ratio for the buffer cache is calculated as
        hit ratio =     1   -   (       physical reads            )
                                ( ------------------------------- )
                                ( consistent gets + db block gets )

Converting this to a percentage (multiply by 100) gives the percentage of times an Oracle process asked for a data block and found it in the buffer cache. A good hit ratio is expected for OLTP type systems but decision support type systems may have much lower hit ratios.

Note:In Oracle 7.3.4 and Oracle8 the "physical reads" statistic includes direct block reads as well as reads to get data into the buffer cache. Hence the above formula only gives an lower bound for the hit ratio on these releases. See <Note:33883.1> for more information.

A good hit ratio does not mean the application is good. It is quite possible to get an excellent hit ratio by using a very unselective index in a heavily used SQL statement.
Eg: Consider a statement like:

        select * from employee where empid=1023 and gender='MALE';
If EMPLOYEE is a large table and this statement always uses the GENDER index rather than the EMPID index then you scan LOTS of blocks (from the GENDER index) and find nearly all of them in the cache as everyone is scanning this same index over and over again. The hit ratio is excellent but performance is very bad. Conversely repeated full table scans of large tables do not cause the blocks to be cached so make the hit ratio worse.

In Oracle8 it is possible to use multiple buffer pools (by setting BUFFER_POOL_KEEP or BUFFER_POOL_RECYCLE init.ora parameters). Multiple buffer pools are not discussed here but the hit ratios can be seen using the V$BUFFER_POOL_STATISTICS view which is created by the CATPERF.SQL script.

Calls
parse count, execute count, user calls and recursive calls give an indication of the number of requests the server processed in the sample period and whether a significant portion of these were parse calls .
Misconceptions
This short section describes a few of the statistics which have been seen incorrectly documented in the past:

<C> Average Write Queue

Based on figures from <View:V$SYSSTAT>
The average write queue length gives an indication of whether the cache is being cleared of dirty blocks fast enough or not. Unless the system is very high throughput the write queue should never be very large. As a general guideline if the average write queue is in double figures it is certainly worth finding where the activity is coming from. If the figure is in single digits there may still be un-necessary activity in the cache.

The 2 main ways to improve the write queue length are:

  1. Eliminate any unnecessary write IO from going via the cache.
  2. Improve DBWR throughput. Eg: Use asynchronous writes, multiple database writer processes, larger block write batch.
See <Note:62172.1> for information about the Tuning the buffer cache and DBWR.

<D> Wait Events and <E> Background Wait Events

Based on <View:V$SYSTEM_EVENT>
In 7.1 and 7.2 there is only one wait-event list for all processes. In 7.3 and Oracle8 the waits are split out for background and user processes. This section , along with information about CPU usage gives a picture of what Oracle is really doing.

Columns in this section:

   Event Name           Name of the waitevent
   Count                Number of times a wait occurred in the sample
                         period
   Total Time           Total time spent waiting (1/100ths of a second)
   Avg Time             Average time per wait (1/100ths of a second)
See "What to look at" for details of this section of the report.

<F> Latch Waits

Based on <View:V$LATCH>
This section is important if there are significant waits for latch free in the list of wait events.

Columns in this section:

   LATCH_NAME           Name of the latch. Note that some latches have a 
                        parent with children so this may be a summary line.   
   GETS                 Number of times latch was requested
   MISSES               Number of times it could not be acquired on the
                        first attempt
   HIT_RATIO            Ratio indicating percentage of times we acquire a 
                        latch as soon as it is requested. Many articles 
                        indicate this is important but it can be misleading. 
                        Eg: GETS=10, MISSES=8 gives a very poor hit ratio 
                            but is insignificant if the total latch misses 
                            are 200000.
   SLEEPS               This is much more important - it indicates the number 
                        of times we had to sleep because we could not get a 
                        latch. The latch/es with the highest SLEEPS figures 
                        are the ones to concentrate on.
   SLEEPS/MISS          Average sleeps per miss gives an idea how long
                        each get is having to wait to get the latch.

Time spent waiting for a latch is time wasted. Isolate the latches with the top SLEEPS figures and refer to 'latch free' for what to do with this information.

<G> NoWait Latch Misses

Based on <View:V$LATCH>
If an Oracle process asks for a latch but is not prepared to wait for that latch it can request the latch NOWAIT. If the latch is not available the process may take some other course of action. It is very difficult to assess the impact of NOWAIT MISSES as the cost of a miss can be very small.

<H> Buffer Busy waits

Based on <View:V$WAITSTAT>
This section is useful if the wait event buffer busy waits is a significant portion of the time waited. It shows which class/es of block are responsible for the waits. Such waits can be due to heavy contention for data blocks, segment headers, undo blocks etc.. Isolating the true cause of the waits needs more information than is contained in the bstat/estat report.

The columns are:

   CLASS                Block class waited for.
   COUNT                Number of waits
   TIME                 Total time waited (1/100ths of a second)
See buffer busy waits for more details.

<I> Rollback Segment statistics

Based on <View:V$ROLLSTAT>
This section is useful to see the size and usage of the rollback segments.

Columns in this section are

   UNDO_SEGMENT         Undo segment number. Get the name from:
                         SELECT * form DBA_ROLLBACK_SEGS 
                          WHERE segment_id=&UNDO_SEGMENT;
   TRANS_TBL_GETS       Number of gets for the rollback segment header
                         in the sample period
   TRANS_TBL_WAITS      Number of waits for the rollback segment header
                         in the sample period
   UNDO_BYTES_WRITTEN   Number of bytes written to the rollback segment
                         in the sample period
   SEGMENT_SIZE_BYTES   Size of the rollback segment at end of sample
                         period
   XACTS                Meaningless: active Xacts at end (estat) MINUS 
                                     active transactions at start
   SHRINKS              Number of rollback segment shrinks during the
                         sample period.
   WRAPS                Number of wraps from one extent to another
                         during the sample period.
TRANS_TBL_WAITS indicates which rollback segment headers had waits for them. Typically you would want to reduce such contention by adding rollback segments. How much this is affecting performance can be gauged from the time spent waiting for buffer busy waits for undo segment header blocks.

If SHRINKS is non zero then one or more rollback segments has OPTIMAL set. While OPTIMAL can be useful there have been problems in certain Oracle versions related to rollback segments shrinking back to their optimal size.


<J> Parameter Settings

Based on <View:V$PARAMETER>
This shows non-default parameter settings. Note that the actual value of some of the parameters may differ from the settings in the init.ora file - Eg: DB_FILE_MULTIBLOCK_READ_COUNT generally has an upper bound enforced.

It is difficult to give general tips for init.ora parameters but check for the following:

Generic:
   CURSOR_SPACE_FOR_TIME        Should generally be FALSE
   DB_BLOCK_LRU_STATISTICS      TRUE may show latch contention in Oracle 7.2 
                                 onwards 
   EVENT                        Should only ever be set under guidance from
                                 Oracle Support.
   TIMED_STATISTICS             Should be TRUE

Operating System Specific:
   USE_READV                    TRUE may impact table scan speed
   REDUCE_ALARM                 TRUE may severely impact performance
   USE_ISM                      TRUE may cause problems on some platforms/
                                 versions 
   CPU_COUNT                    This should normally default correctly but
                                see platform specific information as on some
                                platforms/versions it may need setting
                                explicitly. 
   PRE_PAGE_SGA                 On some platforms TRUE may impact performance

<K> Dictionary Cache

Based on <View:V$ROWCACHE>
In Oracle6 days users had to tune the dictionary cache but it is now self tuning.

It is unlikely there is any problem in this section of the report. If there are any values with large GETS and a poor hit ratio check the figures with Oracle support.


<L> Tablespace IO and <M> File IO

Based on <View:V$FILESTAT>
These 2 sections can be very long as they list all tablespaces and files together with the number of IOs and timings for the IOs. Both of these sections report the same information, one summarising per tablespace and one per file.

The columns in these sections are:

   TABLE_SPACE      Tablespace name
   FILE_NAME        File name in this tablespace   
   READS            Number of read calls to the OS
   BLKS_READ        Number of blocks read.
                       The above two differ only when multi-block
                       reads are being performed. Eg: On full table
                       scans we read up to DB_FILE_MULTIBLOCK_READ_COUNT
                       blocks per read.
   READ_TIME        Total time for the reads in 1/100ths of a second
   WRITES           Number of OS write calls
   BLKS_WRT         Number of blocks written
   WRITE_TIME       Write time in 1/100ths of a second. NB: this figure
                    may be incorrect.
Take care with information presented in this section as it is now very common for files to be striped or managed by a volume manager so there is not any obvious relationship between each Oracle file and the physical disks. Note too that an Oracle IO is a call to the OS to perform an IO operation and may not result in a physical disk IO. For this reason OS statistics should be used when looking at tuning IO throughput.

The 4 important things you can get from this section are:

  1. Which tablespace/s have most read activity
  2. If tablespace/s are involved in multiblock reads
  3. IO subsystem read times. >20mS per read is typically slow for single block reads. Very fast read times may indicate that reads are being satisfied from an OS cache.
  4. Which tablespace/s have the most write activity

Read IO
Isolate the files (and tablespace names) where 80% of the read IO is occurring. Typically there will be many files with little IO and a few with heavy IO. If the read times are large (>40mS per block read) then look at the IO sub-system. If the read times are very small then it is possible that the blocks being read are cached below Oracle (Eg: in the OS buffer cache) which may indicate repeated reads from the same disk blocks are occurring.

Remember that in some applications random access of large amounts of data mean that you do expect to be using disk access to some information so do not expect to be able to eliminate read IO completely.

If there are an excessive number of reads occurring the source of these can often be isolated using the following two statements:

Locating IO heavy SQL:

  SELECT address, to_char(hash_value,'999999999999') "Hash Value", 
         disk_reads, executions, disk_reads/executions "Reads/Exec", 
         sql_text
    FROM v$sqlarea 
   WHERE disk_reads > 500 and executions>0
   ORDER BY 3
   ;
Locating IO heavy sessions to trace:
  SELECT v.sid, substr(s.name,1,30) "Statistic", v.value
    FROM v$statname s , v$sesstat v
   WHERE s.name =  'physical reads'
     and v.statistic#=s.statistic#
     and v.value>0
   ORDER BY 3
   ;

  Note: This shows reads since the session connected and does not 
        represent the IO rate of each session.
Once the statement and/or sessions have been isolated SQL_TRACE can be used.

Multi-Block reads
If any of these shows BLKS_READ/READS > 1 then there is some mutliblock reading occurring. If this is not expected look for full table scans occurring on non-cached tables. Repeated scans of the same tables may also accompanied by very fast read times when blocks are cached below Oracle (Eg: In the OS buffer cache)

See the db file scattered read wait event for further information.

Write IO
Isolate the tablespaces and files responsible for 80% of the write IO. In particular one should note if most IO is to a tablespace called TEMP or similar as this implies disk sorts are occurring. Use SORT_DIRECT_WRITES where possible to ensure such writes to not impact buffer cache performance (See <Note:62172.1>).

Write times are often incorrect so if the figures look unbelievable across all files then they probably are (unbelievable). If some files seem to have sensible write times and others do not then provided all files are the same type (RAW files or OS files) then it is worth checking if there are problems with particular files.