Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Preserve Surrogate Key During Upgrade

Posted by Dylan Wan on September 23, 2017

The generated surrogate key is used everywhere in the data warehouse.  What do we do during upgrade?

Here are some approaches:

1. Full Refresh

You can perform a full refresh of the data warehouse.  The surrogate keys will be regenerated.  The FK will be updated.

Obviously, this is not a good approach.  There are problems with this approach:

  • Too big impact
  • The historical data may not be available from source
  • Lost the historical data in the data warehouse
    • Type2 records
    • Historical snapshots

2. Partial Refresh – Refresh the impacted fact tables 

If the change happens to a single table or a few of tables only, we can just identify the tables that requires refresh and refresh those tables only.

The requires you to keep a dependency map or be able to generate the dependency map.

The key from the table X will be regenerated.  Which tables have the FK to table X, let’s refresh those tables only.

3. Dimension Attribute Update – Preserve the key values

If the change is about adding columns to a dimension table, we should not need to regenerate the surrogate key.  We should just simply using the nature key from the change source, find the record and make an update to it.

In some cases, the records are merged in the source and multiple records become one single record.  If you do not really care of the duplicate data and the minimizing the FK update is the objective, we can just update the dimension attributes.

4. Dimension Lookup the Old Key

Insert can be much faster than update.  Update is actually sometime very expensive.

If you need to load the data from the source with a lot of new columns and update a lot of rows, it may be easier by populating a staging table and regenerate the dimension table by looking up the old key.

The old key lookup table can be based on a backup of the old table or a simple rename of the old table.

Since the newly populated dimension table carries  the old key values, fixing the FK value from the fact tables can be avoided or minimized.

5. Keep a Dimension Key Lookup table

This can be done as a separate table from the dimension table.

Typically, this approach is required if the grain of the dimension table changes.

If it involves the split of the original row into multiple rows, building a dimension key lookup table to map the old key with additional attributes to the new key may be necessary.

This lookup table will be used to update fact tables.

The FK columns from the fact tables will be updated with the new key values.

6. Regenerate facts from old fact tables

Update can be more expensive than recreating the data.  This is especially true for multiple columns.  If multiple FK columns are involved or if there are multiple dimension table involved, one idea is to backup the fact table (with historical data )and use the backup table as the source and join to the dimension table or the dimension key lookup table to regenerate the fact records.

If the original fact table does not carry the nature key from the source for dimension lookup, the process of creating the backup table will need to join the old dimension tables to get the original nature key and save them together into the fact table.

Conclusion

Using surrogate key can be expensive especially if we need to maintain them during upgrade.

Different approaches can be used to preserve the FK and preserve the historical data, you do not always need to perform a full refresh.

 

 

 

Leave a comment