Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Subset Dimension

Subset Dimension is a performance tuning technique used in Siebel Analytics Data Warehouse.

These subset dimension tables are extracted from dimension tables using a filter on a particular attribute of the parent dimension table. In other word, they contain all the columns of the parent dimension table.

Subset dimension tables are primarily used in conjunction with aggregate tables.

See also: Siebel Analytics 7.8.2 Data Model Reference (with Siebel support web access only)

Please note that the”subset dimension” is different from the “subdimension” used in Ralph Kimball’s books. It seems that this technique is not commonly seen in dimensional modeling materials. It is mainly a physical data warehouse performance tuning technique and may not be necessary considered in the modeling phase.

The data warehouse surrogate key for the subset dimension is always the same as that in the original dimension (the superset dimension). There is no need to have additional foreign key column in the fact table.

The subset dimension must have the same SCD type as the original dimension unless the SCD type is determined at the subset level. For example, you may make the EMPLOYEE dimension as a subset dimension of the PERSON dimension. If you make the PERSON dimension as a SCD type 2 table with all historical records tracked in the table, the EMPLOYEE dimension will also hold the historical records.

When should we use the subset dimension?

The subset dimension can be used when the dimension table has the following characteristics:

  1. The superset dimension is very large. You may have a problem of querying its data.
  2. You can identify a set of fact tables that reference only a subset of records in the dimension.
  3. The subset can be identified as by applying a filter using the dimension attribute.

What are the steps to create the subset dimension?

Creating a subset dimension requires the functional knowledge about the data in your dimension table. If you create the subset dimension without having all the required dimension members for describing your fact data, you have a a dangling FK from the fact table.

A subset dimension table can be created simply using the following SQL DDL statement:

CREATE TABKE subset_tbl
AS
SELECT *
FROM superset_tbl
WHERE superset_tbl.filter_col = 'the filter'
Advertisements