Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Aggregation for Data Warehouse, Part 2

Posted by Dylan Wan on April 9, 2007

In the aggregation for Data Warehouse, Part 1, I mentioned that there are multiple ways to do the aggregation for data warehouse. One of the method is to use the ETL tool to maintain the aggregated tables.

Some ETL tools, such as Informatica, provides the Aggregator transformation. The aggregator transformation allows you to perform aggregate calculation such as sums and average. Informatica performs the aggregation as it reads. If the data is not sorted, it stores the grouped data in its aggregate cache for each group until it reads the entire source. If you have a huge number of distinct values in the group, it may create cache files. In general, this method is recommended only if the data is pre-sorted.

Another alternative is to use the aggregate function from the database. In Informatica, you can do a SQL override. The select statement can be overridden with the SQL aggregate function and group by clause. This approach can be implemented to avoid the memory cache problem with the Aggregate transformations as described above.

I will cover other options of aggregation for data warehouse in the subsequence posts.



One Response to “Aggregation for Data Warehouse, Part 2”

  1. […] Posts Aggregation for Data Warehouse, Part 2How to handle Slowly Changing Dimensions (SCDs) in data model design?Install Siebel Analytics – Step […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s