Schema On Read?

Posted by Dylan Wan on September 24, 2017

I saw “create external table ” first in Oracle DBMS 11G.

It was created for the purpose of loading data.

When Hive was introduced, a lot of data were already created in HDFS.

Hive was introduced to provided the SQL interface on these data.

Using the external table concept is a nature of the design.  It is not really a creative thing.

Conformed Dimension and Data Mining

Posted by Dylan Wan on April 20, 2015

Market Segmentation and Data Mining

Posted by Dylan Wan on April 3, 2015

1. Market Segmentation in the academic world 

Market Segmentation is part of marketing process. It is described in Philip Kotler’s book as part of the step of defining the market strategy.The idea is to segment consumer market by some variables and to divide the market into different segments. Selecting the segments for your products is the result of the marketing strategy.  If you have multiple product lines, this definition process can be done for each product line if not done at the individual product level.

For example, my product X is targeted to sell to the consumers who are women, without kids, living in city, and having more than $90000 income.

This web page includes a very good and concise summary – Market Segmentation and Selection of Target Segments.  It reminded what I learned from my Marketing course in 20 years ago.

2. Marketing Segmentation as a product feature

How to integrate Oracle Data Mining and Oracle BI

Posted by Dylan Wan on April 2, 2015

Here are the various ways that we can use Data Mining inside BI.

We can build Advanced Analytics applications.

The scoring function can be called within the opaque view or with EVALUATE function.

The opaque view method may provide a better flexibility since multiple columns can be exposed.

Here is an old Oracle white paper about how to use EVALUATE in BI Server: white paper

Data Mining Scoring Engine Development Process

Posted by Dylan Wan on April 1, 2015

Here is how I view data mining:

The target is to build a scoring engine.

It accepts an input and produces the output.

The development process can be separate as Requirement, Design, Coding, and Deploy.  Similar to typical software development phases.

Why Use Data Mining with Data Warehouse?

Posted by Dylan Wan on April 1, 2015

1. Use the data warehouse data as the training set

Data Mining requires the training data to train the learning algorithm.  The data warehoucing processes provide the following services:

  • Consolidate the data from different sources
  • Aggregate the data: for example, we have the order return transactions but the training data can be # of returns by customers and by products.
  • Capture the historical data – This can be accomplished using the TYPE2 dimension or periodic snapshots. for example, if you are going to do time series analysis, the source data may not keep the history.
  • Data Cleansing:  The quality of the data impacts the quality of the scoring engine. Handling the missing data by setting different default value.
  • Normalize the values, using domain lookup, or transformation logic.  For example, transform the numeric data to categories.
  • Transform the data structure to fit the structure required by data mining models

2. Provide the scoring service as the additional services provided by BI applications

The scoring engine can be deployed as a service.  The service can be provided from the BI and can be embedded in other apps.

For example, a data warehouse may use the historical orders to do the market basket analysis.  The results of the scoring engine needs to deployed in the ecommerce apps, not as BI reports or dashboard.

3. Showing the scoring or the prediction together with the rest of contents

For example, the customer profitability score can be shown wherever the customer data is shown.  The predictive profitability score can help adjust the customer interactions at all layers of the activities.

This can be done at different layer:

a. Run-time scoring:  No ETL process involved, call the scoring API from BI

This depends on the BI platform you are using.  If you are using Oracle BIEE and Oracle Data Mining Option, the opaque view can be used.

b. Scoring as part of the regular ETL process or as a batch process:  we can come up the persistent storage for holding the results of the scoring.  The data will be reflected when the data is refreshed.

