The ANALYZE statement
The ANALYZE statement should no more be used to
collect optimizer statistics. The COMPUTE and ESTIMATE clauses of ANALYZE statement
are supported solely for backward compatibility and may be removed in a future release.
The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers
statistics more efficiently.
You may continue to use ANALYZE statement to for
other purposes not related to optimizer statistics collection:
More information can be found here.
Gathering Statistics with DBMS_STATS Procedures
Statistics are gathered using the DBMS_STATS
package. This PL/SQL package is also used to modify, view, export, import, and delete
statistics.
The DBMS_STATS package can gather statistics on
table and indexes, and well as individual columns and partitions of tables.
When you generate statistics for a table, column, or
index, if the data dictionary already contains statistics for the object, then Oracle
updates the existing statistics. The older statistics are saved and can be restored later
if necessary.
When statistics are updated for a database object,
Oracle invalidates any currently parsed SQL statements that access the object. The next
time such a statement executes, the statement is reparsed and the optimizer
automatically chooses a new execution plan based on the new statistics.
Collect Statistics on Table Level
sqlplus scott/tiger
exec dbms_stats.gather_table_stats ( 
ownname
=> 'SCOTT', 
tabname
=> 'EMP', 
estimate_percent => dbms_stats.auto_sample_size, 
method_opt => 'for all
columns size auto', 
cascade
=> true, 
degree => 5 
)
/ PL/SQL procedure successfully
completed.
ownname

Schema of table to
analyze. 
tabname

Name of table. 
estimate_percent

Percentage of rows to estimate (NULL
means compute). Use the constant DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine
the appropriate sample size for good statistics. This is the default. 
method_opt

The default is FOR ALL COLUMNS SIZE
AUTO. 
cascade

Gather statistics on the indexes for
this table. Index statistics gathering is not parallelized. Using this option is
equivalent to running the GATHER_INDEX_STATS Procedure on each of the table's
indexes. Use the constant DBMS_STATS.AUTO_CASCADE to have Oracle determine whether
index statistics to be collected or not. 
degree

Degree of parallelism. The default
for degree is NULL. 
All Parameters / Options can be found in the Oracle
Manual
Collect Statistics on Schema Level
sqlplus scott/tiger
exec dbms_stats.gather_schema_stats ( 
ownname
=> 'SCOTT', 
options
=> 'GATHER', 
estimate_percent => dbms_stats.auto_sample_size, 
method_opt => 'for all
columns size auto', 
cascade
=> true, 
degree => 5 
)
/ PL/SQL procedure successfully
completed.
ownname

Schema to analyze (NULL means current schema).

options

gather

Reanalyzes the whole schema 
gather empty

Only analyzes tables that have no existing
statistics 
gather stale

Only reanalyzes tables with more than 10%
modifications (inserts, updates, deletes). 
gather auto

Reanalyzes objects which currently have no
statistics and objects with stale statistics (Using gather auto is like
combining gather stale and gather empty.) 
Note that both gather stale and gather auto require monitoring. If you issue the
alter table xxx monitoring command, Oracle tracks
changed tables with the dba_tab_modifications view, which allows you to see the
exact number of inserts, updates, and deletes tracked since the last analysis of
statistics.

estimate_percent

Percentage of rows to estimate (NULL
means compute). Use the constant DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine
the appropriate sample size for good statistics. This is the default. 
method_opt

The default is FOR ALL COLUMNS SIZE
AUTO. 
cascade

Gather statistics on the indexes for
this table. Index statistics gathering is not parallelized. Using this option is
equivalent to running the GATHER_INDEX_STATS Procedure on each of the table's
indexes. Use the constant DBMS_STATS.AUTO_CASCADE to have Oracle determine whether
index statistics to be collected or not. 
degree

Degree of parallelism. The default
for degree is NULL. 
All Parameters / Options can be found in the Oracle
Manual
Collect Statistics on Other Levels
DBMS_STATS can collect optimizer statistics on the following levels, see Oracle Manual
GATHER_DATABASE_STATS
GATHER_DICTIONARY_STATS
GATHER_FIXED_OBJECTS_STATS
GATHER_INDEX_STATS
GATHER_SCHEMA_STATS
GATHER_SYSTEM_STATS
GATHER_TABLE_STATS
Statistics for Partitioned Schema Objects
Partitioned schema objects may contain multiple sets
of statistics. They can have statistics which refer to the entire schema object as a
whole (global statistics), they can have statistics which refer to an individual
partition, and they can have statistics which refer to an individual subpartition of a
composite partitioned object.
Unless the query predicate narrows the query to a
single partition, the optimizer uses the global statistics.
Because most queries are not likely to be this restrictive, it is most important to
have accurate global statistics. Therefore, actually gathering global statistics with
the DBMS_STATS package is highly recommended.
Oracle Histogram Statistics
If DBMS_STATS discovers an index whose columns are unevenly
distributed, it will create histograms for that index to aid the costbased SQL optimizer
in making a decision about index versus fulltable scan access.
Example
First we set up a table with some very skewed data  so skewed that when we query
WHERE ID=1, Oracle Database will want to use an index on ID, and when we query WHERE
ID=99, Oracle Database will not want to use an index.
sqlplus scott/tiger
CREATE TABLE skewed_data
AS
SELECT DECODE(ROWNUM,1,1,99) id,
all_objects.*
FROM all_objects
/
Table created.
CREATE INDEX idx_skewed_data ON skewed_data (id);
Index created.
begin
dbms_stats.gather_table_stats
( ownname => USER,
tabname => 'SKEWED_DATA',
method_opt => 'for all indexed columns size
254',
cascade => TRUE
);
end;
/
PL/SQL procedure successfully completed.
set autotrace traceonly explain
SELECT * FROM skewed_data WHERE id=1;
Execution Plan

Plan hash value: 1799207757

 Id 
Operation
 Name  Rows 
Bytes  Cost (%CPU) Time 

 0  SELECT
STATEMENT

 1  96 
2 (0) 00:00:01 
 1  TABLE ACCESS BY INDEX ROWID
SKEWED_DATA  1  96
 2 (0) 00:00:01 
* 2  INDEX RANGE
SCAN  IDX_SKEWED_DATA
 1 
 1 (0) 00:00:01 

SELECT * FROM skewed_data WHERE id=99;
Execution Plan

Plan hash value: 746880940

 Id  Operation 
Name  Rows  Bytes  Cost (%CPU)
Time 

 0  SELECT STATEMENT
  51545 
4832K 190 (3) 00:00:03 
* 1  TABLE ACCESS FULL SKEWED_DATA  51545 
4832K 190 (3) 00:00:03 

The Table skewed_data contains a column ID, which is very much skewed most of
the values are 99, with one record containing a value of 1. After we index and gather
statistics on the table (generating histograms on that indexed column, so the optimizer
knows that the data is skewed), we can see that the optimizer prefers an index range scan
over a full scan when ID=1 is used and vice versa for ID=99.
Use Histograms!
The costbased optimiser uses data value histograms
to get accurate estimates of the distribution of column data. Histograms provide improved
selectivity estimates in the presence of data skew, resulting in optimal execution plans
with nonuniform data distributions.
Histograms can affect performance and should be used
only when they substantially improve query plans. They are useful only when they
reflect the current data distribution of a given column. If the data distribution of a
column changes frequently, you must recompute its histogram frequently.
Number of Histograms
The number of Histograms to used is specified with the SIZE parameter in method_opt:
method_opt => 'for all indexed columns size 254',
What are Histograms
Histograms are bands of column values, so that each band contains
approximately the same number of rows. The useful information that the histogram provides
is where in the range of values the endpoints fall.
Consider the following table column with values between 1 and 100 and a
histogram with 10 buckets. If the data in the column is uniformly
distributed, then the histogram looks as follows, where the numbers are the endpoint
values.
Histogram with Uniform Distribution
The number of rows in each bucket is 1/10 of the total number of rows
in the table.
If the data is not uniformly distributed, then the histogram might look
as follows:
Histogram with NonUniform Distribution
In this case, most of the rows have the value 5 for the column.
Only 1/10 of the rows have values between 60 and 100.
Conclusion
The package DBMS_STATS can be used to gather global
statistics. It is most important to have accurate global
statistics for partitioned schema objects. Histograms can
affect performance and should be used only when they substantially improve query
plans.
