Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Why Use Data Mining with Data Warehouse?

Posted by Dylan Wan on April 1, 2015

1. Use the data warehouse data as the training set

Data Mining requires the training data to train the learning algorithm.  The data warehoucing processes provide the following services:

  • Consolidate the data from different sources
  • Aggregate the data: for example, we have the order return transactions but the training data can be # of returns by customers and by products.
  • Capture the historical data – This can be accomplished using the TYPE2 dimension or periodic snapshots. for example, if you are going to do time series analysis, the source data may not keep the history.
  • Data Cleansing:  The quality of the data impacts the quality of the scoring engine. Handling the missing data by setting different default value.
  • Normalize the values, using domain lookup, or transformation logic.  For example, transform the numeric data to categories.
  • Transform the data structure to fit the structure required by data mining models

2. Provide the scoring service as the additional services provided by BI applications

The scoring engine can be deployed as a service.  The service can be provided from the BI and can be embedded in other apps.

For example, a data warehouse may use the historical orders to do the market basket analysis.  The results of the scoring engine needs to deployed in the ecommerce apps, not as BI reports or dashboard.

3. Showing the scoring or the prediction together with the rest of contents

For example, the customer profitability score can be shown wherever the customer data is shown.  The predictive profitability score can help adjust the customer interactions at all layers of the activities.

This can be done at different layer:

a. Run-time scoring:  No ETL process involved, call the scoring API from BI

This depends on the BI platform you are using.  If you are using Oracle BIEE and Oracle Data Mining Option, the opaque view can be used.

b. Scoring as part of the regular ETL process or as a batch process:  we can come up the persistent storage for holding the results of the scoring.  The data will be reflected when the data is refreshed.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s