Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Archive for October, 2017

Incremental ETL : Streaming via Micro-Batch

Posted by Dylan Wan on October 11, 2017

A modern analytic application takes the approach of streaming data to perform the similar process as the traditional data warehousing incremental ETL.

Actually, if we look into Spark Streaming in details, the concept of streaming in Spark and Incremental ETL are the same: Read the rest of this entry »

Advertisements

Posted in Data Warehouse | Tagged: | Leave a Comment »

Incremental ETL – The last refresh date

Posted by Dylan Wan on October 11, 2017

There are multiple ways to model the last refresh date.

In OBIA, DAC and Informatica based ETL, the last refresh date is maintained within DAC. It is maintained at the level of the source tables that populates the data.

Oracle BI DAC User Guide > About Refresh Dates and DAC’s Incremental Load Strategy

In OBIA ODI based ETL, the “Last Extract Date” is maintained in an ETL_LOAD_DATE table at the interface package level.

Oracle BI Applications ETL Guide > ETL Overview > About Changed Data Capture

There are other simpler ways: Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse | Leave a Comment »

Use Bit to represent groups

Posted by Dylan Wan on October 11, 2017

Here I am providing an alternate approach of supporting group membership in MySQL.

It is a common seen requirement that a group may have multiple members and a person may be added to multiple groups.  This many to many relationship is typically modeled in an intersection table.

When the group membership is being used as a filter, for example, to show all the students within a group, it becomes a subquery.

Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, ETL | Leave a Comment »