Aggregation for Data Warehouse, Part 2
Posted by Dylan Wan on April 9, 2007
In the aggregation for Data Warehouse, Part 1, I mentioned that there are multiple ways to do the aggregation for data warehouse. One of the method is to use the ETL tool to maintain the aggregated tables.
Some ETL tools, such as Informatica, provides the Aggregator transformation. The aggregator transformation allows you to perform aggregate calculation such as sums and average. Informatica performs the aggregation as it reads. If the data is not sorted, it stores the grouped data in its aggregate cache for each group until it reads the entire source. If you have a huge number of distinct values in the group, it may create cache files. In general, this method is recommended only if the data is pre-sorted.
Another alternative is to use the aggregate function from the database. In Informatica, you can do a SQL override. The select statement can be overridden with the SQL aggregate function and group by clause. This approach can be implemented to avoid the memory cache problem with the Aggregate transformations as described above.
I will cover other options of aggregation for data warehouse in the subsequence posts.