Aggregation for Data Warehouse, Part I
Posted by Dylan Wan on April 4, 2007
A data warehouse typically contains historical data that are viewed in a summary form and then let the users to drill down to the more detail transactions. The fact data can easily reach the size of gigabytes to terabytes. Summarizing the huge volume of the fact data online may not be practical.
One of the techniques employed by data warehouse designers to improve performance is the creation of summaries, or aggregated tables, which contains pre-calculated data.
For example, you may see the costs against your project with a inception to date amount, which include the historical cost charged against the project up to the current date. Without the pre-calculated data, a query to fetch the ITD amount may take minutes to fetch and summarize the fact data. Using an aggregated table, the amount is periodically refreshed with the amounts from the recent transactions in the latest summarization process. Fetching the data online from the aggregated table probably takes only seconds.
There are multiple ways to create such aggregations:
2. Hand code summarization processes and aggregated tables
3. Materialized Views
4. OLAP Cube
I will exam each of this option in my later posts.