A dimension defines hierarchical (parent/child) relationships
between pairs of columns or column sets. Each value at the child level is
associated with one and only one value at the parent level. A hierarchical
relationship is a functional dependency from one level of a hierarchy to the next
level in the hierarchy. A dimension is a container of logical relationships between
columns, and it does not have any data storage assigned to it.
The star schema is the simplest data warehouse schema. It is
called a star schema because the diagram resembles a star, with points radiating from
a center. The center of the star consists of one or more fact tables and the
points of the star are the dimension tables, as shown in the next
figure.
Only one join establishes the relationship between the fact
table and any one of the dimension tables.
A star schema optimizes performance by keeping queries simple and
providing fast response time. All the information about each level is stored in
one row.
The same parameters as for materialized views must be setup. Look
here for more
information to setup dimensions.
Here is a simple example, we will setup a SALES table to store the
transaction date, a customer ID and the total number of sales. This table will have
about 420'000 rows in it. Another table, TIME, will store the mapping of transaction
date to month, to quarter, to year. If we join the two together, we can optain
aggregate sales by month, quarter, year and so on.
In a conventional database schema (one without materialized views, dimensions and other structures) these operations
succeed, but they would be very slow. For every row in the sales data, we
would have to perform an indexed read into the lookup table to cenvert either the
transaction date or customer ID into some other values in order to group by this
other value.
Setup of the SALES table
Now we'll construct the SALES table, and load it with some random test data,
generated using the ALL_OBJECTS view.
CREATE TABLE sales (
trans_date DATE,
cust_id INT,
sales_amount NUMBER
);
INSERT /*+ APPEND */ INTO sales
SELECT TRUNC(SYSDATE,'YYYY')+MOD(ROWNUM,366) trans_date,
MOD(ROWNUM,100) cust_id,
ABS(DBMS_RANDOM.RANDOM)/100 sales_amount
FROM all_objects
/
COMMIT;
29518 rows created.
The generation of TRANS_DATE is simply the first day of the year:
TRUNC(SYSDATE,'YYYY') plus a number between 1 and 365: MOD(ROWNUM,366). The CUST_ID
is a number between 0 and 99. The total number of sales is a large number generated
with DBMS_RANDOM.
The ALL_OBJECTS view in Oracle 9.2.0 contains about 29000 rows, so after four inserts
that consecutively double the size of the table, we'll have about 470000 records.
BEGIN
FOR i IN 1 .. 4
LOOP
INSERT /*+ APPEND */ INTO sales
SELECT trans_date, cust_id,
ABS(DBMS_RANDOM.RANDOM)/100 sales_amount
FROM sales;
COMMIT;
END LOOP;
END;
/
SELECT COUNT(*) FROM sales;
COUNT(*)
----------
472288
Setup of the TIME table
Now, we need to setup the TIME index organization table, to roll up the date field by
month, year, quarter and so on. You can create an index-organized table using the
AS subquery, in this case you may not specify any column datatypes.
CREATE TABLE time (
day PRIMARY KEY,
mmyyyy,
mon_yyyy,
qtr_yyyy,
yyyy
)
ORGANIZATION INDEX
AS
SELECT DISTINCT
trans_date DAY,
CAST (TO_CHAR(trans_date,'MMYYYY') AS NUMBER) MMYYYY,
TO_CHAR(trans_date,'MON-YYYY') MON_YYYY,
'Q' || CEIL(TO_CHAR(trans_date,'MM')/3) || ' FY'
|| TO_CHAR(trans_date,'YYYY') QTR_YYYY,
CAST(TO_CHAR(trans_date, 'YYYY') AS NUMBER) YYYY
FROM sales
/
CAST converts one built-in datatype value into another built-in datatype
value, in our example from CHAR to NUMBER. CEIL returns smallest integer
greater than or equal to the given argument.
In this case, we generated:
-
MMYYYY - The month including the year
-
MON_YYYY - Same as above but we spelled out the month
-
QTR_YYYY - The quarter of the year, including the year
-
YYYY - The year itself
Now we have the tables SALES and the table TIME which maps the transaction date to
month, quarter and year. In general, the computations required to create this
table could be much more complex, but for our simple example it's good enough.
SELECT * FROM sales;
TRANS_DAT CUST_ID SALES_AMOUNT
--------- ---------- ------------
02-JAN-02 1
7130179.82
03-JAN-02 2
7656584.03
04-JAN-02 3
11632950.1
05-JAN-02
4 19069888
06-JAN-02
5 74638.04
07-JAN-02 6
18585004.5
08-JAN-02 7
18604133.3
09-JAN-02 8
3155867.56
10-JAN-02 9
15125628.5
11-JAN-02 10
9703397.22
12-JAN-02 11
3604883.28
......... ..
..........
column QTR_YYYY format a10
SELECT * FROM time;
DAY MMYYYY
MON_YYYY QTR_YYYY YYYY
--------- ---------- -------- ---------- ----------
01-JAN-02 12002 JAN-2002 Q1
FY2002 2002
02-JAN-02 12002 JAN-2002 Q1
FY2002 2002
03-JAN-02 12002 JAN-2002 Q1
FY2002 2002
04-JAN-02 12002 JAN-2002 Q1
FY2002 2002
05-JAN-02 12002 JAN-2002 Q1
FY2002 2002
06-JAN-02 12002 JAN-2002 Q1
FY2002 2002
07-JAN-02 12002 JAN-2002 Q1
FY2002 2002
08-JAN-02 12002 JAN-2002 Q1
FY2002 2002
09-JAN-02 12002 JAN-2002 Q1
FY2002 2002
10-JAN-02 12002 JAN-2002 Q1
FY2002 2002
11-JAN-02 12002 JAN-2002 Q1
FY2002 2002
......... ..... ........ ..
...... ....
......... ..... ........ ..
...... ....
19-DEC-02 122002 DEC-2002 Q4
FY2002 2002
20-DEC-02 122002 DEC-2002 Q4
FY2002 2002
21-DEC-02 122002 DEC-2002 Q4
FY2002 2002
22-DEC-02 122002 DEC-2002 Q4
FY2002 2002
23-DEC-02 122002 DEC-2002 Q4
FY2002 2002
24-DEC-02 122002 DEC-2002 Q4
FY2002 2002
25-DEC-02 122002 DEC-2002 Q4
FY2002 2002
26-DEC-02 122002 DEC-2002 Q4
FY2002 2002
27-DEC-02 122002 DEC-2002 Q4
FY2002 2002
28-DEC-02 122002 DEC-2002 Q4
FY2002 2002
29-DEC-02 122002 DEC-2002 Q4
FY2002 2002
30-DEC-02 122002 DEC-2002 Q4
FY2002 2002
31-DEC-02 122002 DEC-2002 Q4
FY2002 2002
Setup of the SALES_MV materialized view
Now ww will create the materialized view SALES_MV, which
rolls the data up from individual days to months.
ANALYZE TABLE SALES COMPUTE STATISTICS;
ANALYZE TABLE TIME COMPUTE STATISTICS;
CREATE MATERIALIZED VIEW sales_mv
BUILD IMMEDIATE
REFRESH ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT sales.cust_id,
SUM(sales.sales_amount) sales_amount,
time.mmyyyy
FROM sales, time
WHERE sales.trans_date = time.day
GROUP BY sales.cust_id, time.mmyyyy
/
ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;
ALTER SESSION SET QUERY_REWRITE_INTEGRITY=TRUSTED;
SET AUTOTRACE ON
SELECT time.mmyyyy, SUM(sales_amount)
FROM sales, time
WHERE sales.trans_date = time.day
GROUP BY time.mmyyyy
/
SET AUTOTRACE OFF
MMYYYY SUM(SALES_AMOUNT)
---------- -----------------
12002
4.2986E+11
12003
1.3510E+10
22002
3.9045E+11
32002
4.3043E+11
42002
4.1727E+11
52002
4.3197E+11
62002
4.1814E+11
72002
4.3161E+11
82002
4.3163E+11
92002
4.1298E+11
102002 4.2711E+11
112002 4.1221E+11
122002 4.2792E+11
13 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=448
Bytes=11648)
1 0 SORT (GROUP BY) (Cost=6 Card=448 Bytes=11648)
2 1 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=2
Card=448 Bytes=11648)
Oracle rewrote the query to use the materialized view
SALES_MV (see Execution Plan). However, let's see what happens if we issue a query
that calls for a higher level of aggregation.
SET TIMING ON
SET AUTOTRACE ON
SELECT time.qtr_yyyy, SUM(sales_amount)
FROM sales, time
WHERE sales.trans_date = time.day
GROUP BY time.qtr_yyyy
/
SET AUTOTRACE OFF
QTR_YYYY SUM(SALES_AMOUNT)
---------- -----------------
Q1 FY2002 1.2507E+12
Q1 FY2003 1.3510E+10
Q2 FY2002 1.2674E+12
Q3 FY2002 1.2762E+12
Q4 FY2002 1.2672E+12
Elapsed: 00:00:16.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2804 Card=5 Bytes=145)
1 0 SORT (GROUP BY) (Cost=2804 Card=5 Bytes=145)
2 1 NESTED LOOPS (Cost=193 Card=472288 Bytes=13696352)
3 2 TABLE ACCESS (FULL) OF 'SALES' (Cost=193
Card=472288 Bytes=6139744)
4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_31704'
(UNIQUE)
We see that Oracle does not yet that it could have used the materialized view to answer this particular query, so it used the
original SALES table instead, and had to do a lot of work to get the answer. The same
bad thing would happen if we requested data aggregated by fiscal year.
Setup of the TIME_DIM dimension
So, let's use a DIMENSION to alert Oracle to the fact that the materialized view would be useful in answering this question.
CREATE DIMENSION time_dim
LEVEL DAY IS time.day
LEVEL MMYYYY IS time.mmyyyy
LEVEL QTR_YYYY IS time.qtr_yyyy
LEVEL YYYY IS time.yyyy
HIERARCHY TIME_ROLLUP
(
day CHILD OF
mmyyyy CHILD OF
qtr_yyyy CHILD OF
yyyy
)
ATTRIBUTE mmyyyy
DETERMINES mon_yyyy;
This tells Oracle that the DAY column of the TIME table implies MMYYYY, which in turn
implies QTR_YYYY. Finally, QTR_YYYY. Finally QTR_YYYY implird YYYY. Also stated is
the fact that MMYYYY and MON_YYYY are synonymous - there is a one-to-one mapping
between the two. So, any time Oracle sees MON_YYYY used, it understands it as if
MMYYYY was used. Now that Oracle has a greather understanding of the relationships
between the data we can see a great improvement in ou query response times:
SET TIMING ON
SET AUTOTRACE ON
SELECT time.qtr_yyyy, SUM(sales_amount)
FROM sales, time
WHERE sales.trans_date = time.day
GROUP BY time.qtr_yyyy
/
SET AUTOTRACE OFF
QTR_YYYY SUM(SALES_AMOUNT)
---------- -----------------
Q1 FY2002 1.2507E+12
Q1 FY2003 1.3510E+10
Q2 FY2002 1.2674E+12
Q3 FY2002 1.2762E+12
Q4 FY2002 1.2672E+12
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20 Card=5 Bytes=195)
1 0 SORT (GROUP BY) (Cost=20 Card=5 Bytes=195)
2 1 HASH JOIN (Cost=8 Card=1585 Bytes=61815)
3 2 VIEW (Cost=5 Card=46 Bytes=598)
4 3 SORT (UNIQUE) (Cost=5 Card=46
Bytes=598)
5 4 INDEX (FAST FULL
SCAN) OF 'SYS_IOT_TOP_31704' (UNIQUE) ....)
6 2 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=2
Card=448 Bytes=11648)
A dimension defines a parent-child relationship between pairs of column sets, where
all the columns of a column set must come from the same table. However, columns in
one column set (or "level") can come from a different table than columns in another
set. The optimizer uses these relationships with materialized views to perform
query rewrite. The Summary Advisor uses these relationships to recommend
creation of specific materialized views.
Commonly used dimensions are customers, products, and time.
In Oracle9i, the dimensional information itself is stored in a dimension table. In
addition, the database object dimension helps to organize and group dimensional
information into hierarchies. This represents natural 1:n relationships between
columns or column groups (the levels of a hierarchy) that cannot be represented with
constraint conditions. Going up a level in the hierarchy is called rolling up
the data and going down a level in the hierarchy is called drilling down the
data.
Another example about dimensions can be found here.
|