Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Data Mashup in OBIEE 12c

Posted by Dylan Wan on October 14, 2015

Data Mashup is a new feature from OBIEE 12c.

It is one of the two main features that OBIEE 12c.  The other one is the visual analyzer.

When I tested the data mashup features, it supports these two scenarios.  Read the rest of this entry »

Posted in BI, Business Intelligence | Tagged: , | Leave a Comment »

Amazon Quick Sight – BI on Cloud?

Posted by Dylan Wan on October 8, 2015

In my post Data Warehouses on Cloud – Amazon Redshift, I mentioned that what would be really useful is providing BI on Cloud, not just Data Warehouse on Cloud.

I felt that BICS makes more sense comparing to Amazon Redshfit.

I discussed with a couple of people last night in a meetup.  Some of them are using Amazon Redshift.  Here are what I heard: Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Do we really need semantic layer from OBIEE?

Posted by Dylan Wan on October 5, 2015

Not all BI tools have the semantic layer.  For example, Oracle Discoverer seems not having a strong semantic layer.

This page summarizes what OBIEE semantic layer can do for you…

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Data Lake vs. Data Warehouse

Posted by Dylan Wan on October 4, 2015

These are different concepts.

Data Lake – Collect data from various sources in a central place.  The data are stored in the original form.  Big data technologies are used and thus the typical data storage is Hadoop HDFS.

Data Warehouse – “Traditional” way of collecting data from various sources for reporting.  The data are consolidated and are integrated.  A data warehouse design that follow the dimensional modeling technique may store data in star schema with fact tables and dimension tables.   Typically a relational database is used.

If we look at the Analytics platform at Ebay from this linkedin slideshare and this 2013 article: Read the rest of this entry »

Posted in Big Data, Data Warehouse, EDW | Tagged: , , | Leave a Comment »

Use DMZ to access BI from outside firewall

Posted by Dylan Wan on October 2, 2015

DMZ is a technology that allows you to configure your network to be accessible outside firewall.

Some of users may want to access some of corporate reports from mobile or from their personal computers.

While VPN and Citrix may be useful for these cases, DMZ can provide another option.

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Lessons Learned – Data Warehouse on Cloud – Amazon Redshift

Posted by Dylan Wan on September 22, 2015

I feel that these are the rules applicable for any cloud based data warehouse solution.  In general, I feel that the on-premise data warehouse deployment probably will remain for a while.

1. For a columnar database, “select *” is bad

I think that the projection needs to be done as early as possible and should be pushed down.

If a column is not needed in the downstream flow, it should not be selected in the first place.

If the application logic is defined in the metadata, the tool should read it and generate the extract logic. Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Data Warehouse on Clould – Amazon Redshift

Posted by Dylan Wan on September 17, 2015

Here is a brief summary of what I learned by reading these materials.

1. The data warehouse is stored in clusters

It can support scale out, not scale up.

“Extend the existing data warehouse rather than adding hardware”

2. Use SQL to access the data warehouse

3. Load data from Amazon S3 (Storage Service) using MPP process

4. Partition / Distribute the data by time

The BI team wanted to calculate some expensive analytics on a few years of data, so we just restored a snapshot and added a bunch of nodes for a few days”
Read the rest of this entry »

Posted in Business Intelligence | 1 Comment »

OTBI Enterprise

Posted by Dylan Wan on September 15, 2015

OTBI Enterprise is the BI cloud service, a SaaS deployment of OBIA.  It is using a data warehouse based architecture.  The ETL processes are handled within the cloud.  The data are first loaded from either on premise or cloud sources using various means in the original formats.   The data are first loaded into the SDS schema, then the regular ETL processes that were designed for on premise deployment can be used to populate the data warehouse.  All these processes are supposed to be transparent to the deploying companies.

As of September 2015, OTBI-E for HCM is the only module that is available.  Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , , , | Leave a Comment »

Is Apache Spark becoming a DBMS?

Posted by Dylan Wan on September 9, 2015

I attended a great meetup and this is the question I have after the meeting.

Perhaps the intent is to make it like a DBMS, like Oracle, or even a BI platform, like OBIEE?

The task flow it actually very similar to a typical database profiling and data analysis job.

1. Define your question

2. Understand and identify your data

3. Find the approach / model that can be used

Read the rest of this entry »

Posted in Big Data, Business Intelligence | Tagged: , | Leave a Comment »

Schema On Demand and Extensibility

Posted by Dylan Wan on September 3, 2015

Today I see a quite impressive demo in the Global Big Data Conference.

AtScale provides a BI metadata tool for data stored in Hadoop.

At first, I thought that this is just another BI tool that access Hadoop via Hive like what we have in OBIEE.  I heard that that the SQL performance for BI query over Hive could be very slow.   The typical issue is that when the query involves joins, the SQL join may be translated to map /reduce codes by Hive.  Doing the Join in this way may not be as effective as the RDMBS.

However, the concept is actually very different here.  Traditionally ROLAP is built on relational database and we use relational join between the fact table and the dimension table.  When we see the Oracle-acquired tool like Endeca, we already see the data modeling principle changes.  Endeca does not model data in star schema.   It simply denormalizes dimension data into fact table. It can thus run query fast.  AtScale seems doing exactly the same thing.   When the data is stored in the Hadoop cluster, the data is not normalized by separating data into fact and dimension.   It just stores as the data as the source and duplicating the dimension into fact.  There is really no join here.  The closest design technique in OBIEE I can think of is to use degenerated approach.  However, will it work for using Hadoop as a source?

What really impressed me is the concept of Schema on Demand.   I feel that this is actually the major challenge of ROLAP and relational database technology.  When we model the potential additional attributes, we have to add placeholder columns to the relational table.  However, in the data storage / database technology that store attributes as Key Value pairs or as Map, the data do not have to be stored as columns.  This is actually nothing new.  Oracle database has the VARRAY support since Oracle 8.  However, there is no BI tool I am aware of can support this Oracle object type.  While Oracle database has moved to not just supporting relational tables, the BI tool still make the assumption of supporting relational tables only.

It seems that AtScale solved this challenge by generating the metadata that can perform the attribute map to column transformation.  I guess that we will be getting to see these big data technologies start getting into the traditional BI tool space.  It is not due to the 3 Vs nature of the big data, it is due to the flexibility.

Posted in Business Intelligence | Leave a Comment »

Using Data Relationship Management to Maintain Hierarchies for BI Apps (1)

Posted by Dylan Wan on September 2, 2015

DRM is a generic data management application.

It provides a web based application that allows the deploying company to maintain the data.

It is a collaboration tool that allows you to define the validation and set up the data security duties and share the maintenance.

Earlier the tool was designed to maintain the account information.  However, the tool actually can be used to maintain extensions to any dimension.

The key is that it can enable the deploying company to capture and maintain information that is out side the transaction system for BI reporting purpose.


Posted in Enrichment | Tagged: , | Leave a Comment »

ODI Agent

Posted by Dylan Wan on July 16, 2015

An ODI agent is a process running for performing the data integration jobs.

The data integration jobs are ODI scenarios in ODI.   Each scenario can be regarded as an executable program.

However, we do not run those ODI scenarios directly.  These jobs are executed by the ODI agent.

An ODI agent is a Java program.  It connects to an ODI repository to get those scenarios to be executed.  It also use the repository for writing the logs.  You need to provide the information about which ODI repository it connects to, in order to start the ODI agent process.  On the machine you are running the ODI agent, you do not need to have the full ODI installation.  Only the Java library and Jar file that includes the ODI agent is needed.  ODI studio software is not needed.

Since it is a Java program, you need Java runtime installed.  Since it connects to ODI repository database, JDBC is required. Since the technologies used by scenario varies.  The machine that the ODI agent is running needs to have those technologies available.

An ODI agent is a server process.  It is allocated with a PORT number and after its starts, it is listening the requests.  This means that it can accept the remote request.  In the other word, submission of the job may come from a remote machine.

Typically you will submit the job from the ODI studio.  You can also submit the job from ODI SDK.

See: Executing an ODI Object with ODI Agent

See: How to run an ODI 11g scenario from Java

During the design time, you do not need to run a standalone ODI agent.  An ODI agent is embedded within the ODI studio.  The job submitted within the ODI studio can be executed by the local agent.

Posted in Business Intelligence | Leave a Comment »

Business Rules in Enterprise Apps and Data Mining Applications

Posted by Dylan Wan on May 26, 2015

Transaction system has several places to capture the decisions and has several places to capture the business rules.

These are the potential places where the data mining scoring engine can be deployed.

Here are typical “decisions” captured in the enterprise apps I can think of: Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Opportunities for Cloud based BI

Posted by Dylan Wan on May 21, 2015

Three opportunities for the future Cloud based BI: Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

RFM Analysis in Oracle BI Apps

Posted by Dylan Wan on April 24, 2015

I wrote the article RFM Analysis earlier.  We recently posted a more detailed description about how Oracle BI Apps implements this concept in the product.

Customer RFM Analysis

RFA related customer attributes are good examples of aggregated performance metrics as described in this design tip from the Kimball Group

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Conformed Dimension and Data Mining

Posted by Dylan Wan on April 20, 2015

I believe that Conformed Dimensions are playing a key roles in data mining.  Here is why: Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Market Segmentation and Data Mining

Posted by Dylan Wan on April 3, 2015

1. Market Segmentation in the academic world 

Market Segmentation is part of marketing process. It is described in Philip Kotler’s book as part of the step of defining the market strategy.The idea is to segment consumer market by some variables and to divide the market into different segments. Selecting the segments for your products is the result of the marketing strategy.  If you have multiple product lines, this definition process can be done for each product line if not done at the individual product level.

For example, my product X is targeted to sell to the consumers who are women, without kids, living in city, and having more than $90000 income.

This web page includes a very good and concise summary – Market Segmentation and Selection of Target Segments.  It reminded what I learned from my Marketing course in 20 years ago.

2. Marketing Segmentation as a product feature

Once the variables and the segments are defined as part of the marketing strategy, Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , | Leave a Comment »

How to integrate Oracle Data Mining and Oracle BI

Posted by Dylan Wan on April 2, 2015

Here are the various ways that we can use Data Mining inside BI.

We can build Advanced Analytics applications.

The scoring function can be called within the opaque view or with EVALUATE function.

The opaque view method may provide a better flexibility since multiple columns can be exposed.

Here is an old Oracle white paper about how to use EVALUATE in BI Server: white paper

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Data Mining Scoring Engine Development Process

Posted by Dylan Wan on April 1, 2015

Here is how I view data mining:

The target is to build a scoring engine.

It accepts an input and produces the output.

The development process can be separate as Requirement, Design, Coding, and Deploy.  Similar to typical software development phases.

Posted in Business Intelligence | Tagged: , | 1 Comment »

Why Use Data Mining with Data Warehouse?

Posted by Dylan Wan on April 1, 2015

1. Use the data warehouse data as the training set

Data Mining requires the training data to train the learning algorithm.  The data warehoucing processes provide the following services:

  • Consolidate the data from different sources
  • Aggregate the data: for example, we have the order return transactions but the training data can be # of returns by customers and by products.
  • Capture the historical data – This can be accomplished using the TYPE2 dimension or periodic snapshots. for example, if you are going to do time series analysis, the source data may not keep the history.
  • Data Cleansing:  The quality of the data impacts the quality of the scoring engine. Handling the missing data by setting different default value.
  • Normalize the values, using domain lookup, or transformation logic.  For example, transform the numeric data to categories.
  • Transform the data structure to fit the structure required by data mining models

2. Provide the scoring service as the additional services provided by BI applications

The scoring engine can be deployed as a service.  The service can be provided from the BI and can be embedded in other apps.

For example, a data warehouse may use the historical orders to do the market basket analysis.  The results of the scoring engine needs to deployed in the ecommerce apps, not as BI reports or dashboard.

3. Showing the scoring or the prediction together with the rest of contents

For example, the customer profitability score can be shown wherever the customer data is shown.  The predictive profitability score can help adjust the customer interactions at all layers of the activities.

This can be done at different layer:

a. Run-time scoring:  No ETL process involved, call the scoring API from BI

This depends on the BI platform you are using.  If you are using Oracle BIEE and Oracle Data Mining Option, the opaque view can be used.

b. Scoring as part of the regular ETL process or as a batch process:  we can come up the persistent storage for holding the results of the scoring.  The data will be reflected when the data is refreshed.

Posted in Business Intelligence | Tagged: , , | Leave a Comment »