How to handle Slowly Changing Dimensions (SCDs) in data model design?
Posted by Dylan Wan on January 13, 2007
There are multiple methods to handle the slowly changing dimensions. Which technique to use depends on your business requirements. The choice among these three methods are not a technical design decision since their behaviors are different.
Type One: Overwite the old data with new data
Using this method, you do not store the histoy. For example, that say each customer can have one salesrep at any given point in time. When the salerep of ABC Inc., changes from Sandy to Laura, Sandy was a salerep of ABC will not be kept anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc. forever and count all the sales done by Sandy as Lanura’s.
The above example may not sound making business sense. However, if you only report the sales of the current period, and salesrep does not change during the period, this method is ok to be used.
Mary OLTP tables does not need to track the history of changes and thus this method may be used by the source application. However, if you want to report the historical data, even your OLTP does not track history, the data warehouse can still use other methods to track the history.
Type Two: Add a new record at the timeof the change
Using this method, all priorhistory are saved. There are two alternative methods to model the key of this table.
Method A – No surrogate key – Use timestamp
When a change happens, a new record is added into the table. All the attributes are copied from the previous record except the changed values. The nature key is copied as well so the timestamps is used to differentiate the records.
When a fact table is joined with the dimension, if you are interested in the historical data, the timestamp will be used as part of the join condition. To ease the join, the record typically use two date columns – the effective start date and the effective end date.
Method B – No surrogate key – Use version number
Instead of using the date column, a version number is used to differentiate the different versions of the records.
This technique requires the fact table store both nature key and the version number to retrive a given version of the dimension date.
Method C – Use a surrogate key
When an attribue is change, a sequence generated key is used, the fact table will also use this key column as the foreign key.
Type Three: Track changes using a separate column
Using this method, you use a separate column of dimension table to store the values of previous years, in addition to the current year data.
This method does not track all the history, but just one prior version.
If the data is changed, the old value need to be moved from the current value column to the prior column and the new value overwrites the current column.
This method is used when the changes is not randon but a predefined interval such as annual.