Zurück

The Secrets of Dimensions


Overview

A dimension defines hierarchical (parent/child) relationships between pairs of columns or column sets. Each value at the child level is associated with one and only one value at the parent level. A hierarchical relationship is a functional dependency from one level of a hierarchy to the next level in the hierarchy. A dimension is a container of logical relationships between columns, and it does not have any data storage assigned to it.

Star Schemas

The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown in the next figure.

Only one join establishes the relationship between the fact table and any one of the dimension tables.

A star schema optimizes performance by keeping queries simple and providing fast response time. All the information about each level is stored in one row.

Setup of Dimensions

The same parameters as for materialized views must be setup. Look here for more information to setup dimensions.

Example

Here is a simple example, we will setup a SALES table to store the transaction date, a customer ID and the total number of sales. This table will have about 420'000 rows in it. Another table, TIME, will store the mapping of transaction date to month, to quarter, to year. If we join the two together, we can optain aggregate sales by month, quarter, year and so on.

In a conventional database schema (one without materialized views, dimensions and other structures) these operations succeed, but they would be very slow. For every row in the sales data, we would have to perform an indexed read into the lookup table to cenvert either the transaction date or customer ID into some other values in order to group by this other value.

Setup of the SALES table

Now we'll construct the SALES table, and load it with some random test data, generated using the ALL_OBJECTS view.

CREATE TABLE sales (
   trans_date    DATE,
   cust_id       INT,
   sales_amount  NUMBER
);

INSERT /*+ APPEND */ INTO sales
SELECT TRUNC(SYSDATE,'YYYY')+MOD(ROWNUM,366) trans_date,
       MOD(ROWNUM,100) cust_id,
       ABS(DBMS_RANDOM.RANDOM)/100 sales_amount
  FROM all_objects
/
COMMIT;

29518 rows created.

The generation of TRANS_DATE is simply the first day of the year: TRUNC(SYSDATE,'YYYY') plus a number between 1 and 365: MOD(ROWNUM,366). The CUST_ID is a number between 0 and 99. The total number of sales is a large number generated with DBMS_RANDOM.

The ALL_OBJECTS view in Oracle 9.2.0 contains about 29000 rows, so after four inserts that consecutively double the size of the table, we'll have about 470000 records.

BEGIN
    FOR i IN 1 .. 4
    LOOP
        INSERT /*+ APPEND */ INTO sales
        SELECT trans_date, cust_id, 
          ABS(DBMS_RANDOM.RANDOM)/100 sales_amount
          FROM sales;
        COMMIT;
   END LOOP;
END;
/
SELECT COUNT(*) FROM sales;
 COUNT(*)
----------
    472288

Setup of the TIME table

Now, we need to setup the TIME index organization table, to roll up the date field by month, year, quarter and so on. You can create an index-organized table using the AS subquery, in this case you may not specify any column datatypes.

CREATE TABLE time (
   day  PRIMARY KEY,
   mmyyyy,
   mon_yyyy,
   qtr_yyyy,
   yyyy
)
ORGANIZATION INDEX
AS
SELECT DISTINCT
   trans_date DAY,
   CAST (TO_CHAR(trans_date,'MMYYYY') AS NUMBER) MMYYYY,
   TO_CHAR(trans_date,'MON-YYYY') MON_YYYY,
   'Q' || CEIL(TO_CHAR(trans_date,'MM')/3) || ' FY'
       || TO_CHAR(trans_date,'YYYY') QTR_YYYY,
   CAST(TO_CHAR(trans_date, 'YYYY') AS NUMBER) YYYY
  FROM sales
/

CAST converts one built-in datatype value into another built-in datatype value, in our example from CHAR to NUMBER. CEIL returns smallest integer greater than or equal to the given argument.

In this case, we generated:

  • MMYYYY - The month including the year
  • MON_YYYY - Same as above but we spelled out the month
  • QTR_YYYY - The quarter of the year, including the year
  • YYYY - The year itself

Now we have the tables SALES and the table TIME which maps the transaction date to month, quarter and year. In general, the computations required to create this table could be much more complex, but for our simple example it's good enough.

SELECT * FROM sales;

TRANS_DAT    CUST_ID SALES_AMOUNT
--------- ---------- ------------
02-JAN-02          1   7130179.82
03-JAN-02          2   7656584.03
04-JAN-02          3   11632950.1
05-JAN-02          4     19069888
06-JAN-02          5     74638.04
07-JAN-02          6   18585004.5
08-JAN-02          7   18604133.3
09-JAN-02          8   3155867.56
10-JAN-02          9   15125628.5
11-JAN-02         10   9703397.22
12-JAN-02         11   3604883.28
.........         ..   ..........

column QTR_YYYY format a10
SELECT * FROM time;

DAY           MMYYYY MON_YYYY QTR_YYYY         YYYY
--------- ---------- -------- ---------- ----------
01-JAN-02      12002 JAN-2002 Q1 FY2002        2002
02-JAN-02      12002 JAN-2002 Q1 FY2002        2002
03-JAN-02      12002 JAN-2002 Q1 FY2002        2002
04-JAN-02      12002 JAN-2002 Q1 FY2002        2002
05-JAN-02      12002 JAN-2002 Q1 FY2002        2002
06-JAN-02      12002 JAN-2002 Q1 FY2002        2002
07-JAN-02      12002 JAN-2002 Q1 FY2002        2002
08-JAN-02      12002 JAN-2002 Q1 FY2002        2002
09-JAN-02      12002 JAN-2002 Q1 FY2002        2002
10-JAN-02      12002 JAN-2002 Q1 FY2002        2002
11-JAN-02      12002 JAN-2002 Q1 FY2002        2002
.........      ..... ........ .. ......        ....
.........      ..... ........ .. ......        ....
19-DEC-02     122002 DEC-2002 Q4 FY2002        2002
20-DEC-02     122002 DEC-2002 Q4 FY2002        2002
21-DEC-02     122002 DEC-2002 Q4 FY2002        2002
22-DEC-02     122002 DEC-2002 Q4 FY2002        2002
23-DEC-02     122002 DEC-2002 Q4 FY2002        2002
24-DEC-02     122002 DEC-2002 Q4 FY2002        2002
25-DEC-02     122002 DEC-2002 Q4 FY2002        2002
26-DEC-02     122002 DEC-2002 Q4 FY2002        2002
27-DEC-02     122002 DEC-2002 Q4 FY2002        2002
28-DEC-02     122002 DEC-2002 Q4 FY2002        2002
29-DEC-02     122002 DEC-2002 Q4 FY2002        2002
30-DEC-02     122002 DEC-2002 Q4 FY2002        2002
31-DEC-02     122002 DEC-2002 Q4 FY2002        2002

Setup of the SALES_MV materialized view

Now ww will create the materialized view SALES_MV, which rolls the data up from individual days to months.

ANALYZE TABLE SALES COMPUTE STATISTICS;
ANALYZE TABLE TIME COMPUTE STATISTICS;

CREATE MATERIALIZED VIEW sales_mv
BUILD IMMEDIATE
REFRESH ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT sales.cust_id,
       SUM(sales.sales_amount) sales_amount,
       time.mmyyyy
  FROM sales, time
 WHERE sales.trans_date = time.day
 GROUP BY sales.cust_id, time.mmyyyy
/

ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;
ALTER SESSION SET QUERY_REWRITE_INTEGRITY=TRUSTED;

SET AUTOTRACE ON

SELECT time.mmyyyy, SUM(sales_amount)
  FROM sales, time
 WHERE sales.trans_date = time.day
GROUP BY time.mmyyyy
/
SET AUTOTRACE OFF

    MMYYYY SUM(SALES_AMOUNT)
---------- -----------------
     12002        4.2986E+11
     12003        1.3510E+10
     22002        3.9045E+11
     32002        4.3043E+11
     42002        4.1727E+11
     52002        4.3197E+11
     62002        4.1814E+11
     72002        4.3161E+11
     82002        4.3163E+11
     92002        4.1298E+11
    102002        4.2711E+11
    112002        4.1221E+11
    122002        4.2792E+11

13 rows selected.



Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=448 Bytes=11648)
1  0   SORT (GROUP BY) (Cost=6 Card=448 Bytes=11648)
2  1     TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=2 Card=448 Bytes=11648)

Oracle rewrote the query to use the materialized view SALES_MV (see Execution Plan). However, let's see what happens if we issue a query that calls for a higher level of aggregation.

SET TIMING ON
SET AUTOTRACE ON
SELECT time.qtr_yyyy, SUM(sales_amount)
  FROM sales, time
 WHERE sales.trans_date = time.day
GROUP BY time.qtr_yyyy
/
SET AUTOTRACE OFF

QTR_YYYY   SUM(SALES_AMOUNT)
---------- -----------------
Q1 FY2002         1.2507E+12
Q1 FY2003         1.3510E+10
Q2 FY2002         1.2674E+12
Q3 FY2002         1.2762E+12
Q4 FY2002         1.2672E+12

Elapsed: 00:00:16.03

Execution Plan
----------------------------------------------------------
0  SELECT STATEMENT Optimizer=CHOOSE (Cost=2804 Card=5 Bytes=145)
1  0  SORT (GROUP BY) (Cost=2804 Card=5 Bytes=145)
2  1    NESTED LOOPS (Cost=193 Card=472288 Bytes=13696352)
3  2      TABLE ACCESS (FULL) OF 'SALES' (Cost=193 Card=472288 Bytes=6139744)
4  2      INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_31704' (UNIQUE)

We see that Oracle does not yet that it could have used the materialized view to answer this particular query, so it used the original SALES table instead, and had to do a lot of work to get the answer. The same bad thing would happen if we requested data aggregated by fiscal year.

Setup of the TIME_DIM dimension

So, let's use a DIMENSION to alert Oracle to the fact that the materialized view would be useful in answering this question.

CREATE DIMENSION time_dim
    LEVEL DAY      IS time.day
    LEVEL MMYYYY   IS time.mmyyyy
    LEVEL QTR_YYYY IS time.qtr_yyyy
    LEVEL YYYY     IS time.yyyy
HIERARCHY TIME_ROLLUP
(
 day CHILD OF
 mmyyyy CHILD OF
 qtr_yyyy CHILD OF
 yyyy
)
ATTRIBUTE mmyyyy
DETERMINES mon_yyyy;

This tells Oracle that the DAY column of the TIME table implies MMYYYY, which in turn implies QTR_YYYY. Finally, QTR_YYYY. Finally QTR_YYYY implird YYYY. Also stated is the fact that MMYYYY and MON_YYYY are synonymous - there is a one-to-one mapping between the two. So, any time Oracle sees MON_YYYY used, it understands it as if MMYYYY was used. Now that Oracle has a greather understanding of the relationships between the data we can see a great improvement in ou query response times:

SET TIMING ON
SET AUTOTRACE ON
SELECT time.qtr_yyyy, SUM(sales_amount)
 FROM sales, time
WHERE sales.trans_date = time.day
GROUP BY time.qtr_yyyy
/
SET AUTOTRACE OFF

QTR_YYYY   SUM(SALES_AMOUNT)
---------- -----------------
Q1 FY2002         1.2507E+12
Q1 FY2003         1.3510E+10
Q2 FY2002         1.2674E+12
Q3 FY2002         1.2762E+12
Q4 FY2002         1.2672E+12

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=20 Card=5 Bytes=195)
1  0  SORT (GROUP BY) (Cost=20 Card=5 Bytes=195)
2  1    HASH JOIN (Cost=8 Card=1585 Bytes=61815)
3  2      VIEW (Cost=5 Card=46 Bytes=598)
4  3        SORT (UNIQUE) (Cost=5 Card=46 Bytes=598)
5  4          INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_31704' (UNIQUE) ....)
6  2      TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=2 Card=448 Bytes=11648)

Conclusion

A dimension defines a parent-child relationship between pairs of column sets, where all the columns of a column set must come from the same table. However, columns in one column set (or "level") can come from a different table than columns in another set. The optimizer uses these relationships with materialized views to perform query rewrite. The Summary Advisor uses these relationships to recommend creation of specific materialized views.

Commonly used dimensions are customers, products, and time.

In Oracle9i, the dimensional information itself is stored in a dimension table. In addition, the database object dimension helps to organize and group dimensional information into hierarchies. This represents natural 1:n relationships between columns or column groups (the levels of a hierarchy) that cannot be represented with constraint conditions. Going up a level in the hierarchy is called rolling up the data and going down a level in the hierarchy is called drilling down the data.

More Information

Another example about dimensions can be found here.