WebIV:View NOTE:62161.1 |
Article-ID: <Note:62161.1> Circulation: PUBLISHED (EXTERNAL) Folder: server.Rdbms.Performance.Database Topic: ** Tuning Oracle7 and Oracle8 Platform: GENERIC Generic issue Subject: Systemwide Tuning using UTLESTAT Reports in Oracle7/8 Author: RPOWELL.UK Creation-Date: 27-JUL-1998 14:26:12 Modified-Date: 09-FEB-1999 13:51:10 Document-Type: BULLETIN Impact: MEDIUM Skill-Level: ACCOMPLISHED Component: RDBMS 07.01 - 08.00 Server 07.01 - 08.00 Content-Type: TEXT/X-HTML Attachments: NONE
Some key things to remember about bstat/estat:
At the start of the sample period: In Server Manager: connect / as sysdba (or connect internal) @?/rdbms/admin/utlbstat (Unix) or @D:/orant/rdbms80/admin/utlbstat (NT - use correct path) At the end of the sample period: Change to a directory where you have WRITE permission Ensure there is no local file called 'report.txt' as this is the name of the output file that UTLESTAT will try to create. In Server Manager: connect / as sysdba (or connect internal) @?/rdbms/admin/utlestat (Unix) or @D:/orant/rdbms80/admin/utlestat (NT - use correct path) The report should be written to a file called 'report.txt' in the current directory.
Note: The format of the REPORT.TXT file varies slightly between releases so sections may not be in exactly the same order as shown here. Also not all sections are reported in all Oracle versions.
<A> SVRMGR> Rem Select Library cache statistics. ... LIBRARY GETS GETHITRATI PINS ... The library cache is where Oracle stores object definitions, SQL statements etc.. Each namespace (or library) contains different types of object. The figures here give a quick summary of the usage of each namespace. The most useful indicator here is the RELOADS column.
<B> Statistic Total Per Transact ... There are hundreds of statistics gathered by Oracle. Only a few of these are needed for general tuning. In particular we get information about CPU usage, buffer cache hit ratio, table scanning and sorts from this section.
<C> Average Write Queue Length The average write queue length is a good indication of how busy the buffer cache is.
<D> SVRMGR> Rem System wide wait events for non-background processes ... Event Name Count Total Time Avg Time This is arguably the most important section in the report as it shows how long Oracle is waiting for resources. This will be the starting point for looking at tuning Oracle.
<E> SVRMGR> Rem System wide wait events for background processes ... Event Name Count Total Time Avg Time In earlier Oracle7 releases this section and section <D> are combined. In Oracle 7.3 and Oracle8 this section shows time spent waiting by background processes.
<F> SVRMGR> Rem Latch statistics. ... LATCH_NAME GETS MISSES HIT_RATIO SLEEPS ... Latches are points of serialisation within Oracle so are critical to good performance. This section is important if there are waits for latches occurring.
<G> LATCH_NAME NOWAIT_GETS NOWAIT_MISSES NOWAIT_HIT_RATIO This section shows no-wait latch misses - the cost of a "miss" is difficult to assess - it is likely to be less significant than a "sleep" in the previous section.
<H> SVRMGR> Rem Buffer busy wait statistics. CLASS COUNT TIME This section indicates any waiting for block access and is only really important if there is evidence of delays being introduced due to block contention.
<I> SVRMGR> Rem Waits_for_trans_tbl high implies you should add rollback ... UNDO_SEGMENT TRANS_TBL_GETS TRANS_TBL_WAITS ... This sections indicates rollback segment activity.
<J> SVRMGR> Rem The init.ora parameters currently in effect: NAME VALUE A list of non-default init.ora parameter settings.
<K> NAME GET_REQS GET_MISS SCAN_REQ SCAN_MIS MOD_REQS ... This indications how well the dictionary cache is performing. This is self managed by Oracle so this section can generally be ignored. This section was important in Oracle6.
<L> SVRMGR> Rem Sum IO operations over tablespaces. TABLE_SPACE READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES Both this and the next section indicate where read and write IO is occurring, and show which tablespaces are servicing full table scans.
<M> SVRMGR> Rem I/O should be spread evenly across drives. TABLE_SPACE FILE_NAME READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES Per file read and write IO.
<N> SVRMGR> Rem The times that bstat and estat were run. START_TIME END_TIME ------------------------- ------------------------- 14-may-98 10:30:22 14-may-98 11:30:06 It is important to know the duration of any report. Longer reports tend to average out "peaks" and may hide problem situations.
<O> BANNER ---------------------------------------------------------------- Oracle7 Server Release 7.3.3.5.0 - Production Release Which version is being used - only the first 3 digits should be treated as reliable.
1. Skim over the report and look for negative figures. Usually these mean the database was stopped / started between the bstat and estat runs. Prior to Oracle 7.2 it was common for a few of the larger figures to "wrap" to give negative results (especially the 'memory' statistics). This is less likely to occur in Oracle 7.2 onwards.
2. From section <B> note down the CPU used by this session statistic. This is the cumulative amount of CPU that all sessions have used in the sample period - the name is rather misleading ! Also note down the parse time cpu figure. ('parse time cpu' is included within 'CPU used by this session'). These statistics will be used elsewhere for comparisons.
3. Sections <D> Wait Events and
<E> Background Wait Events
In 7.1 and 7.2 there is only one wait-event list for all processes.
This, along with your CPU figures from '2', gives you the information on
where most time is
going. Whenever an Oracle process waits for something it records it as a
'wait' using
one of a set of predefined waits (see V$EVENT_NAME for a list of all wait
events).
Some of these events can be considered as 'idle' events - ie: the process
is waiting for work.
Others indicate time spent waiting for a resource or action to complete.
By comparing the relative time spent waiting on each wait event and the
CPU used by this session (from 2 above) we can see where the Oracle
instance is spending most of its time.
In order to get an indication of where time is going:
client message } Process waiting for data from the SQL*Net message from client } client application SQL*Net more data from client} rdbms ipc message - Usually background process waiting for work pipe get - DBMS_PIPE read waiting for data Null event - Miscellaneous pmon timer - PMON waiting for work smon timer - SMON waiting for work parallel query dequeue - Waiting for input (Discussed later)
This summary gives a quick indication of where most of the time is going. Eg: if 50% of the time is spent waiting for 'enqueue' there is probably (but not necessarily) a serious locking conflict in the application. If CPU used is above 25% of the total time waited the application may be CPU heavy (there are no hard rules as to what is a high value for CPU used). The top wait events and CPU figures help determine which other sections of the bstat/estat report are worth looking at.
WARNING: The CPU statistics are only updated by Oracle when a user call completes. This means that long running PL/SQL jobs which are CPU intensive will NOT show up in the CPU figures until the outermost PL/SQL block completes. This also means that long running statements that started before UTLBSTAT was run and complete during the sampled period will add their entire CPU usage to the CPU used by this session statistic potentially distorting the figures.
4. If everything in the report looks normal then look at the IO and CPU related sections of this article. If Oracle is behaving well for the workload presented to it there may still be scope to reduce the workload (tuning the SQL etc..)
The aim of the above steps is to identify areas that account for a significant portion of the overall time. Once a potential problem area has been identified one can then look at what options are available to improve the timings. In some cases there are suggestions within this text but you may be referred to separate articles.
>From this point onwards you can either look at the sections appropriate to you or read the entire article. The table below suggests where you should go next for specific scenarios - the wait event names are hyper-links to the relevant section. If using a hard copy of this article then the sections follow the table in the same order as presented here.
Using CPU | |
High CPU usage | You need to know where the CPU is going to |
Common Wait Events (alphabetical order) | |
buffer busy waits | Wait to access a block buffer |
db file scattered read | Multi-block read |
db file sequential read | OS block read - usually single block read |
enqueue | Waiting for an enqueue (lock) |
free buffer waits | Waiting for a free buffer to use. |
latch free | Waiting for a latch |
library cache pin | Waiting for a PIN on a library cache object. |
log buffer xxx | Waiting for log buffer space events |
log file switch xxx | Waiting for log file switch events |
log file sync | Waiting on LGWR to sync redo to disk |
SQL*Net message from dblink | Waiting for response from remote DB |
write complete waits | Wait for a write of a block buffer to complete. |
Special wait events | |
Parallel query dequeue | PQ slave or QC is waiting for a message. Ignore this in calculations |
If the top wait events are not listed in the table above and do not appear as idle wait-event list then see Systemwide Waits for Other Wait Events.
Get the parse time cpu and CPU used by this session figures from the "Statistics" section of the estat report. If parse time cpu represents a large percentage of the CPU time then time is being spent parsing rather than executing statements. If this is the case then it is likely that the application is using literal SQL and not sharing it or the shared pool is very badly configured. See <Note:62143.1> for information on the shared pool and the impact of literal SQL.
If the parse CPU is only a small percentage of the total CPU used then the next task is to determine where the CPU is going. There are several things we can do to help with this:
SELECT address, hash_value, buffer_gets, executions, buffer_gets/executions "Gets/Exec", sql_text FROM v$sqlarea WHERE buffer_gets > 50000 and executions>0 ORDER BY 3 ;The cut-off value of 50000 is an arbitrary starting point and should be increased/decreased gradually until the top 10 - 20 statements are listed. This shows the SQL statements which have the most BUFFER_GETS which typically relates to using most CPU. The statements of interest are those with a large number of gets per execution especially if executions is high. One off statements with high buffer gets may be due to batch jobs so it helps if you have an understanding of the application components to know which statements are expected to be expensive.
NB: The above statement will not highlight CPU intensive PL/SQL blocks
Once candidate statements have been isolated the full statement text
can be obtained using following query substituting relevant values for
ADDRESS and HASH_VALUE pairs:
Once any CPU intensive sessions have been identified <View:V$SESSION>
can be used to get more information.
At this stage it is generally best to revert to user
session tracing (SQL_TRACE) to determine where the CPU is being used.
See <Note:62160.1>
If the TIME spent waiting for buffers is significant then
we need to determine which segment/s is/are suffering from contention.
The "Buffer busy wait statistics"
section of the Bstat/estat shows which block type/s are seeing the most
contention. This information is derived from the <View:V$WAITSTAT> which
can be queried in isolation:
Alternatively capturing session trace may help - See <Note:62160.1>.
If a particular block or range of blocks keep showing waits you can try to
isolate the object using:
The following hints may be useful for particular types of contention - these
are things that MAY reduce contention for particular situations:
SELECT sql_text
FROM v$sqltext
WHERE address='&ADDRESS_WANTED'
and hash_value=&HASH_VALUE
ORDER BY piece
;
The statement can then be explain (using EXPLAIN PLAN) or isolated
for further testing to see how CPU intensive it really is.
Note that if the statement uses bind variables and your data is highly
skewed then the statement may only be CPU intensive for certain bind values.
Locating CPU-Heavy Sessions
The statement below can help locate sessions which have used the most
CPU:
SELECT v.sid, substr(s.name,1,30) "Statistic", v.value
FROM v$statname s , v$sesstat v
WHERE s.name = 'CPU used by this session'
and v.statistic#=s.statistic#
and v.value>0
ORDER BY 3
;
Note that the CPU time is cumulative so a session which has been
connected for several days may appear to be heavier on CPU than one
which has only been connected for a short period of time.
Thus it is better to write a script to sample the CHANGE in the
statistic between 2 known points in time allowing one to see how much
CPU was used by each session in a known timeframe.
Wait Events
The following few sections describe the meanings of the most common
wait events from sections C and D
of the Estat report. For each wait event we attempt to drill down to locate
more information on what is causing the waits.
The wait events are included in alphabetical order.
System wide waits for "buffer busy waits"
This wait happens when a session wants to access a database block in the
buffer cache but it cannot as the buffer is "busy". The two main cases
where this can occur are:
SELECT time, count, class
FROM V$WAITSTAT
ORDER BY time,count
;
This shows the class of block with the most waits at the BOTTOM of
the list.
Additional information can be obtained from the internal view X$KCBFWAIT thus:
SELECT count, file#, name
FROM x$kcbfwait, v$datafile
WHERE indx + 1 = file#
ORDER BY count
;
This shows the file/s with the most waits (at the BOTTOM of the list) so by
combining the two sets of information we know what block type/s in which file/s
are causing waits. The segments in each file can be seen using a query like:
SELECT distinct owner, segment_name, segment_type
FROM dba_extents
WHERE file_id= &FILE_ID
;
If there are a large number of segments of the type listed then monitoring
V$SESSION_WAIT may help isolate which object is causing the waits.
Eg: Repeatedly run the following statement and collect the output.
After a period of time sort the results to see which file & blocks
are showing contention:
SELECT p1 "File", p2 "Block", p3 "Reason"
FROM v$session_wait
WHERE event='buffer busy waits'
;
Note:
In the above query there is no reference to WAIT_TIME as you are
not interested in whether a session is currently waiting or not, just
what buffers are causing waits.
SELECT distinct owner, segment_name, segment_type
FROM dba_extents
WHERE file_id= &FILE_ID
and &BLOCK_NUMBER between block_id and block_id+blocks-1
;
Actions
As buffer busy waits are due to contention for particular blocks then you
cannot take any action until you know which blocks are being competed for and
why. Eliminating the cause of the contention is the best option.
Note that "buffer busy waits" for data blocks are often due to several processes
repeatedly reading the same blocks (eg: if lots of people scan the same index)
- the first session processes the blocks that are in the buffer cache
quickly but then a block has to be read from disk - the other sessions
(scanning the same index) quickly 'catch up' and want the block which is
currently being read from disk - they wait for the buffer as someone is
already reading the block in.
Block Type | Possible Actions |
data blocks | Eliminate HOT blocks from the application. Check for repeatedly scanned / unselective indexes. Change PCTFREE and/or PCTUSED. Check for 'right- hand-indexes' (indexes that get inserted into at the same point by many processes). Increase INITRANS. Reduce the number of rows per block. |
segment header | Increase of number of FREELISTs. Use FREELIST GROUPs (even in single instance this can make a difference). |
freelist blocks | Add more FREELISTS. In case of Parallel Server make sure that each instance has its own FREELIST GROUP(s). |
undo header | Add more rollback segments. |
If the TIME spent waiting for IOs is significant then we can determine which segment/s Oracle has to go to disk for. See the "Tablespace IO" and "File IO" sections of the ESTAT report to get information on which tablespaces / files are servicing the most IO requests, and to get an indication of the speed of the IO subsystem.
A larger buffer cache may help - test this by actually increasing DB_BLOCK_BUFFERS and not by using DB_BLOCK_LRU_EXTENDED_STATISTICS. Never increase the SGA size if it may induce additional paging or swapping on the system.
A less obvious issue which can affect the IO rates is how well data is clustered physically. Eg: Assume that you frequently fetch rows from a table where a column is between two values via an index scan. If there are 100 rows in each index block then the two extremes are:
If the TIME spent waiting for multiblock reads is significant then we need to determine which segment/s we are performing the reads against. See the "Tablespace IO" and "File IO" sections of the ESTAT report to get information on which tablespaces / files are servicing multiblock reads (BLKS_READ/READS>1).
It is probably best at this stage to find which sessions are performing scans and trace them to see if the scans are expected or not. This statement can be used to see which sessions may be worth tracing:
SELECT sid,total_waits, time_waited FROM v$session_event WHERE event='db file scattered read' and total_waits>0 ORDER BY 3,2 ;
Oracle7: SELECT /*+ ROWID(T)*/ * FROM tab T where ROWID>='0.0.0'; Oracle8: In Oracle8 read-ahead is only enabled for Parallel Query Slave processes.WARNING: The ROWID hint causes a range checkpoint and so can have an adverse performance impact, especially in Parallel Server environments.
Eg: Assume Wait Events show enqueue time_waited=3000 total_waits=10 Statistics show enqueue waits has a count of 2 This means there were 2 actual waits whose individual wait times totalled to 3000 hundredths of a second (ie: 30 seconds).To determine which enqueues are causing the most waits system-wide look at <View:X$KSQST> thus:
SELECT ksqsttyp "Lock", ksqstget "Gets", ksqstwat "Waits" FROM X$KSQST where KSQSTWAT>0;This gives the system wide number of waits for each lock type. Remember that it only takes one long wait to distort the average wait time figures.
TX Transaction Lock Generally due to application or table setup issues See <Note:62354.1> for example scenarios which can cause TX lock waits. TM DML enqueue Generally due to application issues, particularly if foreign key constraints have not been indexed. See <OLS:106754.289> for details of TM locks and referential integrity. ST Space management enqueue Usually caused by too much space management occurring (Eg: small extent sizes, lots of sorting etc..) See <Note:33567.1> for more information about the ST enqueue.
If the TIME spent waiting for latches is significant then we need to determine which latches are suffering from contention. The Bstat/estat section on latches shows latch activity in the period sampled. This section of the estat report is based on <View:V$LATCH> gives a summary of latch activity since instance startup and can give an indication of which latches are responsible for the most time spent waiting for "latch free" thus:
SELECT latch#, name, gets, misses, sleeps FROM v$latch WHERE sleeps>0 ORDER BY sleeps ; Note that this select gives the worst latches at the BOTTOM of the list.Some lines in this report are actually for multiple latches all of the same type. To determine if the latch activity is concentrated on one particular latch in the set one can query <View:V$LATCH_CHILDREN> (only available from 7.3 onwards)
SELECT addr, latch#, gets, misses, sleeps FROM v$latch_children WHERE sleeps>0 and latch# = &LATCH_NUMBER_WANTED ORDER BY sleeps ;This gives the system wide number of waits for each child latch of the type LATCH#. If there are no rows returned then there is only a single latch of the type you are looking at.
If there are multiple rows the important thing to note is whether the SLEEPS are reasonably distributed or if there are one or two child latches responsible for 80% of the SLEEPS. If the contention is focused on one or two child latches make a note of which children are seeing a problem - note the ADDR column.
shared pool latch In Oracle 7.1/7.2 there are known issues involving the shared pool and library cache latches. See <Note:32871.1> for details. Heavy use of literal SQL will stress this latch significantly. If your online application makes heavy use of literal SQL statements then converting these to use bind variables will give significant improvements in latch contention in this area. See <Note:62143.1> for issues affecting the shared pool. MTS places extra load on this latch - the use of the LARGE_POOL in Oracle8 may help this. library cache pin (7.1) and library cache latches From Oracle 7.2 onwards the library cache latch has child latches . Problems with these latches are typically due to heavy use of literal SQL or very poor shared pool configuration. If your online application makes heavy use of literal SQL statements then converting these to use bind variables will give significant improvements in latch contention in this area. See <Note:62143.1> for issues affecting the shared pool. cache buffers lru chain latch Setting DB_BLOCK_LRU_STATISTICS to TRUE can adversely affect this latch - always ensure DB_BLOCK_LRU_STATISTICS is set to FALSE. From Oracle 7.3 it is possible to have multiple of these latches by specifying DB_BLOCK_LRU_LATCHES although this really needs multiple CPU's to be of most benefit. Heavy contention for this latch is generally due to heavy buffer cache activity which can be caused, for example, by: Sorting in buffer cache and not using SORT_DIRECT_WRITES Repeatedly scanning large unselective indexes See <Note:62172.1> cache buffers chains latches Individual block contention can show up as contention for one of these latches. Each cache buffers chains latch covers a list of buffers in the buffer cache. If one or two child latches stand out from V$LATCH_CHILDREN then: SELECT dbafil,dbablk,class, state FROM x$bh WHERE hladdr='&ADDR_OF_CHILD_LATCH' ; If this list contains lots of entries for the same dbafil, dbablk in 7.1/7.2 then see <Note:38350.1> and <Bug:256032>. If this list is short (3 to 10 buffers) then one of the buffers in this list is probably very 'hot' - ie: suffers from lots of concurrent access attempts.Additionally Oracle 7.3 introduces a new view which may be of help to Oracle Support in more obscure cases:
SELECT sleep_count, "WHERE" FROM v$latch_misses WHERE parent_name='&PROBLEM_LATCH_NAME' ORDER BY 1 ;This shows circumstances where latches were requested but not obtained immediately.
Addressing this problem is likely to need the help of Oracle support but the following few statements should help obtain useful information.
Repeatedly run the following statement and collect the output. After a period of time sort the results to see if the waits are always for a single "handle" or for several different handles.
SELECT p1 "Handle" FROM v$session_wait WHERE event='library cache pin' ;Also isolate any statements which have high version counts:
SELECT address, to_char(hash_value,'9999999999999'), version_count, sql_text FROM v$sqlarea WHERE version_count>10 ORDER BY version_count ;
7.0 to 7.2: log buffer space/switch 7.3 onwards: log buffer space log file switch (checkpoint incomplete) log file switch (archiving needed) log file switch/archive log file switch (clearing log file) log file switch completion switch logfile command
For all other waits:
LIBRARY Different object types are stored in different 'libraries'. GETS Number of times an item in this 'library' was requested. GETHITRATIO Percentage of times the item was requested and found to already be present in the cache. PINS Number if times an item was 'pinned'. An item has to be 'pinned' in order to be used. Items may be looked up once and then pinned/unpinned many times. PINHITRATIO Percentage of times an item was requested to be pinned and was successfully pinned. RELOADS Number of times an item had to be re-loaded because part of it had been flushed from the cache and was needed. INVALIDATIONS Number of times objects were invalidated in the sample period.If there are a significant number of RELOADS then re-usable information is being flushed from the SGA and thus having to be reloaded / rebuilt. See <Note:62143.1> for details about setting up the shared pool.
If the GetHitRatio or PinHitRatio is low (<.9) then it is likely that the application is using unsharable SQL or infrequently referencing objects. If there is a poor hit ratio see <Note:62143.1> Note that due to the way Oracle works it is unusual for the hit ratios in this section to very low.
Statistic Statistic name Total Change in this statistic between sample times Per Transact "Total" divided by number of COMMITs in the sample period. Per Logon "Total" divided by the number LOGONS that occurred in the sample period. Per Second "Total" divided by time interval between the samples.Most of the statistics can be ignored as they are only useful in specific circumstances. Below are some of the more commonly referenced statistics:
CPU used by this session Actually the CPU used across all sessions in the sample period consistent gets Gets of a block in consistent mode db block gets Gets of a block in current mode enqueue waits Actual number of waits for a lock execute count Number of execute calls parse count Number of parse calls parse count (total) V8 " " parse count (hard) V8 resulting in a hard parse parse time cpu CPU spent parsing physical reads Number of blocks physically read . Note this is blocks read and NOT the number of IO read requests. recursive calls Number of recursive calls (either for dictionary information or PLSQL) sorts (disk) Number of disk based sorts table scans (long tables) Number of full table scans user calls Number of user callsThere is no intention to describe all the statistics here especially as many have self describing names. Some of the statistics may be useful in particular situations so if the text refers to "statistic XXX" then this is the section of the report to find the value. A few notes on some of the more widely used statistics are included below.
WARNING: CPU statistics are only updated by Oracle when a user call completes. This means that long running PL/SQL jobs which are CPU intensive will NOT show up in the CPU figures until the outermost PL/SQL block completes.
hit ratio = 1 - ( physical reads ) ( ------------------------------- ) ( consistent gets + db block gets )Converting this to a percentage (multiply by 100) gives the percentage of times an Oracle process asked for a data block and found it in the buffer cache. A good hit ratio is expected for OLTP type systems but decision support type systems may have much lower hit ratios.
Note:In Oracle 7.3.4 and Oracle8 the "physical reads" statistic includes direct block reads as well as reads to get data into the buffer cache. Hence the above formula only gives an lower bound for the hit ratio on these releases. See <Note:33883.1> for more information.
A good hit ratio does not mean the application is good. It is
quite possible to get an excellent hit ratio by using a very unselective
index in a heavily used SQL statement.
Eg: Consider a statement like:
select * from employee where empid=1023 and gender='MALE';If EMPLOYEE is a large table and this statement always uses the GENDER index rather than the EMPID index then you scan LOTS of blocks (from the GENDER index) and find nearly all of them in the cache as everyone is scanning this same index over and over again. The hit ratio is excellent but performance is very bad. Conversely repeated full table scans of large tables do not cause the blocks to be cached so make the hit ratio worse.
In Oracle8 it is possible to use multiple buffer pools (by setting BUFFER_POOL_KEEP or BUFFER_POOL_RECYCLE init.ora parameters). Multiple buffer pools are not discussed here but the hit ratios can be seen using the V$BUFFER_POOL_STATISTICS view which is created by the CATPERF.SQL script.
The 2 main ways to improve the write queue length are:
See <Note:62172.1> for information about the Tuning the buffer cache and DBWR.
Columns in this section:
Event Name Name of the waitevent Count Number of times a wait occurred in the sample period Total Time Total time spent waiting (1/100ths of a second) Avg Time Average time per wait (1/100ths of a second)See "What to look at" for details of this section of the report.
Columns in this section:
LATCH_NAME Name of the latch. Note that some latches have a parent with children so this may be a summary line. GETS Number of times latch was requested MISSES Number of times it could not be acquired on the first attempt HIT_RATIO Ratio indicating percentage of times we acquire a latch as soon as it is requested. Many articles indicate this is important but it can be misleading. Eg: GETS=10, MISSES=8 gives a very poor hit ratio but is insignificant if the total latch misses are 200000. SLEEPS This is much more important - it indicates the number of times we had to sleep because we could not get a latch. The latch/es with the highest SLEEPS figures are the ones to concentrate on. SLEEPS/MISS Average sleeps per miss gives an idea how long each get is having to wait to get the latch.Time spent waiting for a latch is time wasted. Isolate the latches with the top SLEEPS figures and refer to 'latch free' for what to do with this information.
The columns are:
CLASS Block class waited for. COUNT Number of waits TIME Total time waited (1/100ths of a second)See buffer busy waits for more details.
Columns in this section are
UNDO_SEGMENT Undo segment number. Get the name from: SELECT * form DBA_ROLLBACK_SEGS WHERE segment_id=&UNDO_SEGMENT; TRANS_TBL_GETS Number of gets for the rollback segment header in the sample period TRANS_TBL_WAITS Number of waits for the rollback segment header in the sample period UNDO_BYTES_WRITTEN Number of bytes written to the rollback segment in the sample period SEGMENT_SIZE_BYTES Size of the rollback segment at end of sample period XACTS Meaningless: active Xacts at end (estat) MINUS active transactions at start SHRINKS Number of rollback segment shrinks during the sample period. WRAPS Number of wraps from one extent to another during the sample period.TRANS_TBL_WAITS indicates which rollback segment headers had waits for them. Typically you would want to reduce such contention by adding rollback segments. How much this is affecting performance can be gauged from the time spent waiting for buffer busy waits for undo segment header blocks.
If SHRINKS is non zero then one or more rollback segments has OPTIMAL set. While OPTIMAL can be useful there have been problems in certain Oracle versions related to rollback segments shrinking back to their optimal size.
It is difficult to give general tips for init.ora parameters but check for the following:
Generic: CURSOR_SPACE_FOR_TIME Should generally be FALSE DB_BLOCK_LRU_STATISTICS TRUE may show latch contention in Oracle 7.2 onwards EVENT Should only ever be set under guidance from Oracle Support. TIMED_STATISTICS Should be TRUE Operating System Specific: USE_READV TRUE may impact table scan speed REDUCE_ALARM TRUE may severely impact performance USE_ISM TRUE may cause problems on some platforms/ versions CPU_COUNT This should normally default correctly but see platform specific information as on some platforms/versions it may need setting explicitly. PRE_PAGE_SGA On some platforms TRUE may impact performance
It is unlikely there is any problem in this section of the report. If there are any values with large GETS and a poor hit ratio check the figures with Oracle support.
The columns in these sections are:
TABLE_SPACE Tablespace name FILE_NAME File name in this tablespace READS Number of read calls to the OS BLKS_READ Number of blocks read. The above two differ only when multi-block reads are being performed. Eg: On full table scans we read up to DB_FILE_MULTIBLOCK_READ_COUNT blocks per read. READ_TIME Total time for the reads in 1/100ths of a second WRITES Number of OS write calls BLKS_WRT Number of blocks written WRITE_TIME Write time in 1/100ths of a second. NB: this figure may be incorrect.Take care with information presented in this section as it is now very common for files to be striped or managed by a volume manager so there is not any obvious relationship between each Oracle file and the physical disks. Note too that an Oracle IO is a call to the OS to perform an IO operation and may not result in a physical disk IO. For this reason OS statistics should be used when looking at tuning IO throughput.
The 4 important things you can get from this section are:
Remember that in some applications random access of large amounts of data mean that you do expect to be using disk access to some information so do not expect to be able to eliminate read IO completely.
If there are an excessive number of reads occurring the source of these can
often be isolated using the following two statements:
Locating IO heavy SQL:
SELECT address, to_char(hash_value,'999999999999') "Hash Value", disk_reads, executions, disk_reads/executions "Reads/Exec", sql_text FROM v$sqlarea WHERE disk_reads > 500 and executions>0 ORDER BY 3 ;Locating IO heavy sessions to trace:
SELECT v.sid, substr(s.name,1,30) "Statistic", v.value FROM v$statname s , v$sesstat v WHERE s.name = 'physical reads' and v.statistic#=s.statistic# and v.value>0 ORDER BY 3 ; Note: This shows reads since the session connected and does not represent the IO rate of each session.Once the statement and/or sessions have been isolated SQL_TRACE can be used.
See the db file scattered read wait event for further information.
Write times are often incorrect so if the figures look unbelievable across all files then they probably are (unbelievable). If some files seem to have sensible write times and others do not then provided all files are the same type (RAW files or OS files) then it is worth checking if there are problems with particular files.