Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

ODI Agent

Posted by Dylan Wan on July 16, 2015

An ODI agent is a process running for performing the data integration jobs.

The data integration jobs are ODI scenarios in ODI.   Each scenario can be regarded as an executable program.

However, we do not run those ODI scenarios directly.  These jobs are executed by the ODI agent.

An ODI agent is a Java program.  It connects to an ODI repository to get those scenarios to be executed.  It also use the repository for writing the logs.  You need to provide the information about which ODI repository it connects to, in order to start the ODI agent process.  On the machine you are running the ODI agent, you do not need to have the full ODI installation.  Only the Java library and Jar file that includes the ODI agent is needed.  ODI studio software is not needed.

Since it is a Java program, you need Java runtime installed.  Since it connects to ODI repository database, JDBC is required. Since the technologies used by scenario varies.  The machine that the ODI agent is running needs to have those technologies available.

An ODI agent is a server process.  It is allocated with a PORT number and after its starts, it is listening the requests.  This means that it can accept the remote request.  In the other word, submission of the job may come from a remote machine.

Typically you will submit the job from the ODI studio.  You can also submit the job from ODI SDK.

See: Executing an ODI Object with ODI Agent

See: How to run an ODI 11g scenario from Java

During the design time, you do not need to run a standalone ODI agent.  An ODI agent is embedded within the ODI studio.  The job submitted within the ODI studio can be executed by the local agent.


Posted in Business Intelligence | Leave a Comment »

Business Rules in Enterprise Apps and Data Mining Applications

Posted by Dylan Wan on May 26, 2015

Transaction system has several places to capture the decisions and has several places to capture the business rules.

These are the potential places where the data mining scoring engine can be deployed.

Here are typical “decisions” captured in the enterprise apps I can think of: Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Opportunities for Cloud based BI

Posted by Dylan Wan on May 21, 2015

Three opportunities for the future Cloud based BI: Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

RFM Analysis in Oracle BI Apps

Posted by Dylan Wan on April 24, 2015

I wrote the article RFM Analysis earlier.  We recently posted a more detailed description about how Oracle BI Apps implements this concept in the product.

Customer RFM Analysis

RFA related customer attributes are good examples of aggregated performance metrics as described in this design tip from the Kimball Group

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Conformed Dimension and Data Mining

Posted by Dylan Wan on April 20, 2015

I believe that Conformed Dimensions are playing a key roles in data mining.  Here is why: Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Market Segmentation and Data Mining

Posted by Dylan Wan on April 3, 2015

1. Market Segmentation in the academic world 

Market Segmentation is part of marketing process. It is described in Philip Kotler’s book as part of the step of defining the market strategy.The idea is to segment consumer market by some variables and to divide the market into different segments. Selecting the segments for your products is the result of the marketing strategy.  If you have multiple product lines, this definition process can be done for each product line if not done at the individual product level.

For example, my product X is targeted to sell to the consumers who are women, without kids, living in city, and having more than $90000 income.

This web page includes a very good and concise summary – Market Segmentation and Selection of Target Segments.  It reminded what I learned from my Marketing course in 20 years ago.

2. Marketing Segmentation as a product feature

Once the variables and the segments are defined as part of the marketing strategy, Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , | Leave a Comment »

How to integrate Oracle Data Mining and Oracle BI

Posted by Dylan Wan on April 2, 2015

Here are the various ways that we can use Data Mining inside BI.

We can build Advanced Analytics applications.

The scoring function can be called within the opaque view or with EVALUATE function.

The opaque view method may provide a better flexibility since multiple columns can be exposed.

Here is an old Oracle white paper about how to use EVALUATE in BI Server: white paper

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Data Mining Scoring Engine Development Process

Posted by Dylan Wan on April 1, 2015

Here is how I view data mining:

The target is to build a scoring engine.

It accepts an input and produces the output.

The development process can be separate as Requirement, Design, Coding, and Deploy.  Similar to typical software development phases.

Posted in Business Intelligence | Tagged: , | 1 Comment »

Why Use Data Mining with Data Warehouse?

Posted by Dylan Wan on April 1, 2015

1. Use the data warehouse data as the training set

Data Mining requires the training data to train the learning algorithm.  The data warehoucing processes provide the following services:

  • Consolidate the data from different sources
  • Aggregate the data: for example, we have the order return transactions but the training data can be # of returns by customers and by products.
  • Capture the historical data – This can be accomplished using the TYPE2 dimension or periodic snapshots. for example, if you are going to do time series analysis, the source data may not keep the history.
  • Data Cleansing:  The quality of the data impacts the quality of the scoring engine. Handling the missing data by setting different default value.
  • Normalize the values, using domain lookup, or transformation logic.  For example, transform the numeric data to categories.
  • Transform the data structure to fit the structure required by data mining models

2. Provide the scoring service as the additional services provided by BI applications

The scoring engine can be deployed as a service.  The service can be provided from the BI and can be embedded in other apps.

For example, a data warehouse may use the historical orders to do the market basket analysis.  The results of the scoring engine needs to deployed in the ecommerce apps, not as BI reports or dashboard.

3. Showing the scoring or the prediction together with the rest of contents

For example, the customer profitability score can be shown wherever the customer data is shown.  The predictive profitability score can help adjust the customer interactions at all layers of the activities.

This can be done at different layer:

a. Run-time scoring:  No ETL process involved, call the scoring API from BI

This depends on the BI platform you are using.  If you are using Oracle BIEE and Oracle Data Mining Option, the opaque view can be used.

b. Scoring as part of the regular ETL process or as a batch process:  we can come up the persistent storage for holding the results of the scoring.  The data will be reflected when the data is refreshed.

Posted in Business Intelligence | Tagged: , , | Leave a Comment »

Business Intelligence Cloud Service – Data Modeler

Posted by Dylan Wan on March 30, 2015

These video shows how the data are loaded to BI Cloud Service and are modeled as dimensions and facts.

We do not need to use the BI admin tool to create model. For BICS, we can create model using the browser.

Posted in Business Intelligence | Tagged: , | 1 Comment »

Source Dependent Extract and Source Independent Load

Posted by Dylan Wan on March 24, 2015

Typical data warehousing ETL process involves Extract, Transform, and Load.

The concept of Source Dependent Extract (SDE) and Source Independent Load (SIL) are unique part of the BI Apps ETL since BI Apps has a universal data warehouse.

Since the staging schema are designed according to the universal data warehouse design, the logic of loading data from the staging tables to the data warehouse tables are source independent.

Since the staging schema follows the universal data warehouse design, the SDE tasks actually have the specific extraction and mapping logic for each source systems.


1. Extraction process could impact the performance of the source system since it uses the source system resources.

It typically runs in the limited data warehousing window when the source system does not have much load.

It explains why the change data capture and incremental ETL are important.  We need to use as less resource from source system as possible.

2. The transformation logic could become the overhead.

if the source to target mapping is simply the column mapping, it should not introduce much overhead.  However, if the transformation logic involves heavy calculations, it may become an issue.

It leads the adaption of several technologies:

  • Use Oracle GoldenGate
  • SDS
  • Persistent Staging Tables
  • ODI LKM vs IKM

Posted in Business Intelligence | Leave a Comment »

BI Apps has an Universal Data Warehouse

Posted by Dylan Wan on March 23, 2015

BI Apps data warehouse design is based on an assumption that the data warehouse schema design is independent from OLTP system.

The staging schema is an universal staging and the data warehouse is an universal data warehouse.

The assumption is that no matter what the source system you are using, the business questions the BI consumers have are similar.  You want to see revenue, cost, and profitability, regardless if you are using Oracle E-Business Suite, PeopleSoft, or J.D. Edwards, revenue are still revenue, cost are still cost.  The definitions of these metrics won’t vary too much from the OLTP systems you are using.

It is the BI Apps development team’s job to find where this numbers should be extracted from the source systems and to define the “map” between the source system and the target data warehouse system.

This universal data warehouse assumption does give BI Apps a unique advantage.  It allows the teams develop the source adaptor independently from the data warehouse design and thus independently from the BI metadata.

In another way to view this, the application logics built in the BI tool and the knowledge of designing the data warehouse schema can be reused over time and over many projects.

Adding the support of the 1st adaptor to E-Business Suite may take sometime to analyze the analytic requirements.  Adding the support of the PeopleSoft and JDE adaptors becomes a job of providing the source and target logical mapping and create the ETL jobs of loading the universal staging tables.

Posted in Business Intelligence | Tagged: , , | 1 Comment »

Extracting Data from Cloud Apps

Posted by Dylan Wan on March 5, 2015

I think that it would be easier if the cloud application can be aware of the data integration needs and publish the interfaces proactively.

Here are some basic requirements for the applications that can be considered as data integration friendly:

1. Publish the object data model

This is required for source analysis.

For example, here is the RightNow Object Model.

The documentation should describe the object, the attributes available, the identifiers, the validation rules, the relationships among the objects, including their cardinality, the domain or lookup used in validating the data, etc.

2. Support a scalable solution for the Full Extract

Initial load may require a full extract of the historical transaction data and reference data.  This is typically a one-time only event during implementation phase.

If the extract starts about the same time when the cloud apps is initially implemented, the volume of the initial extract may be similar to incremental extract.  However, if the Cloud BI is implemented much later, after the transational apps, the volume of initial load may be big.

3. Support incremental extract / Change Data Capture

The daily or near real-time extract requires a way to identify the delta, the data created or updated recently.

In many systems, this can be done via an auditing column, such as the last updated date column.  However, if the source object model allows a nested structure.  The last update date may need to be available in two forms, update the header only, or update any objects including the child and the descendant.

4. Delete Capture

Capturing the deleted data during the incremental extract is different from the creation and update cases. If the deleted rows are not extracted, the data will be kept in the data warehouse and be treated as a normal data.  The report could be considered wrong.

In some source systems, deletion and archiving are different events.  We need to have a way to differentiate these two events.

5. Metadata Extract

This is a more advanced feature.  It is required for supporting the custom objects.

In other normal objects, the expected data model is determined during the design time.

However, if the source system allows customization that can add or enable new attributes or even creating new objects, the extraction program needs to be able to detect the model change and react accordingly.

6. Extracting the Lookup or Domain data

Many systems support the multi-language using lookup table.  The idea is that the translated strings are kept in a separate table and allow the data shown on the application UI in different languages depending on the preferred language from the login user.  Not all source system can support this feature.  However, for the source systems that can support this feature, extracting the lookup data can allow the BI showing the data in the same translated as the source system.

The requirement for the extracting process is to extract the data in all languages that are expected in the BI / Data Warehouse side.

7. Extracting data for integrated data security

If the users that can access the cloud source apps and the users that can access BI are the same set of users, or a subset, the user and authentication service should be shared.

However, if the user or the internal person record identifiers are used in building the data security, extracting the users and the roles granted may be necessary.

8. Extracting the UI displayed Labels

In some cases, the cloud BI can be a seamless integrated or even embedded into the source apps.  The displayed lable for the object and for the attributes need to be consistent.  In such cases, extracting the display labels are also requried.

It may be similar to the process of extracting the lookup or domain data which have the data available in multiple languages.

Posted in Business Intelligence | Tagged: , , , , , | Leave a Comment »

Cloud to Cloud ETL

Posted by Dylan Wan on January 16, 2015

I found that Cloud to Cloud ETL is different from the regular ETL running in the on premise env in several ways:

1. Direct Database Access is not available

2. Complex / Nested Objects

3. Paging

4. Volume Restriction

5. File base approach

6. Data Streaming

7. Data Security / Protection

8. Change data capture

9. Translation / System Data Availability

10. Metadata Reverse Engineering

11. API / Interface versioning

12. Frequency of the update / patching

13. Provisioning

Posted in Business Intelligence | Leave a Comment »

BI Applications in Cloud

Posted by Dylan Wan on October 6, 2014

Prepackaged analytics applications are available as cloud services.

The idea is that the client company does not need to use their own hardware and does not need to install the software or apply patches by themselves.

What they need is just simply the browsers.

For the end users, there should not be much difference.   The BI apps built on the OBIEE platform is already a web based application.  Users today, even when they use the BI applications deployed on premise, access the applications via the browser.

The real difference is to the IT department and to the company from the investment perspectives.

The deployment time should be very short.

Where is the role of the ISV, VAR, or SI in this picture?

I think that their role is still very important.

First of all, the out of box reports may not work for everyone.  New reports may need to be created.

Secondly, the source apps may be extended.  For example, oracle cloud services, such as CX clouds, support extensible attributes.  Where those attribute need to be reported should be probably added or hidden.

When the data are not loaded from other Oracle cloud applications, loading the data from Oracle’s  Application Unlimited products such as PeopleSoft, E-Business Suite, Siebel, J.D. Edwards may require a lot of work.

Posted in Business Intelligence | Tagged: , , , | Leave a Comment »

Supply Chain Questions

Posted by Dylan Wan on September 28, 2014

The key question are about inventory and orders.

  • What is our inventory status? by product and by location?
    • Do we have enough inventory on hand and in process to meet customer demand?
    • How can we maximize customer satisfaction while minimizing cost and size of inventory on hand? Both over stocking and under stocking are bad
    • Do we have enough finished goods to our backlog orders?  How much raw materials, work in process, and finished goods on hand?
  • How well is our order fulfillment process performing?
    • What are the top orders in backlog?  How can we prioritize strategic customers?
    • How can we ensure fulfillment on time?
    • Which products have the longest fulfillment cycles?
  • Where are the bottlenecks in order management process?
    • What is our order management cycle time? How long does it take for booking an order?
  • Do we know why customers cancel orders and return products?
    • How much revenue was lost due to order cancellations or returns

Typical dimensions: By Product, By Location, By Customer

Different roles: Director of Sales Operations, Inventory Managers, Order Management Managers

Integration with other systems:

  • Customers and pipeline information from a CRM application
  • Product information from ERP
  • Workforce information from Human Resource
    • Align the training plans to skill gaps in critical product area
      • Correlate staffing levels with order volume and product plan

Posted in BI Application | Tagged: | Leave a Comment »

Analytics 3.0

Posted by Dylan Wan on January 2, 2014

One of the books I like most about data warehousing is the book e-Data, written by Jill Dyche.  Here is a paper she co-authorized with Thomas H. Davenport about Big Data:

Big Data in Big Companies: Executive Summary

Thomas H. Davenport is the author of the book Competing on Analytics, which is also a book I own and found interesting.

I cannot agree more about these two points they mentioned in the above executive summary:

  • Overall, we found the expected co-existence; in not a single one of these large organizations was big data being managed separately from other types of data and analytics. The integration was in fact leading to a new management perspective on analytics, which we’ll call “Analytics 3.0.”
  • “It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products, and services that emerge from analysis.”

I feel that what customers may really need is an integrated prepackaged BI apps that is built on an integrated BI platform and the associated tech stacks that can handle big data together with the other existing enterprise data.  Building an non-integrated big data data warehouse is time consuming and may not get a good balance of the cost and the benefit.  The non integrated big data data warehouse may work well for the specialized firms for special purposes.  However, when we think of leveraging the big data technology in the enterprise, integration with the existing data warehouse is the key for success.


Posted in Business Intelligence | 1 Comment »

Data Warehouse for Big Data: Scale-Up vs. Scale-Out

Posted by Dylan Wan on January 1, 2014

Found a very good paper: http://research.microsoft.com/pubs/204499/a20-appuswamy.pdf

This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.

It is hard to argue with the industry trend.  However, Hadoop is not new any more.  It is time for people to calm down and rethink about the real benefits.

Read the rest of this entry »

Posted in Business Intelligence | 1 Comment »

Technologies behind Oracle Transactional Business Intelligence (OTBI)

Posted by Dylan Wan on July 25, 2013

Oracle Transactional Business Intelligence (OTBI) is one of the business intelligence solutions provided as part of Fusion Applications.

To build a real-time BI, the major challenge is to make sure that it can perform and has no or minimum interfere to the core objective of the transactional application, the online processing.

This is the reason why we need Oracle Business Intelligence Applications (OBIA) for Fusion Applications.  The idea is to keep the minimal processing of detecting changes and capturing changes in the transactional system and leave everything else, such as, preparing and consolidating the data for reporting, to BI Applications.

Here are some of the technologies available to make OTBI possible:

1. SQL Trimming from ADF

ADF stands for Application Development Framework.  It is the application development framework used in developing Fusion Applications.  In general, it is a declarative metadata driven framework to let the application developers to define the data model, define the data access layer, define the UI rendering, put the validation logic and processing in the middle tier.

The underlying data model, in most of cases, is still the relational model defined in the Fusion Apps transactional database under the 3rd NF design.

The key enabling technologies provided from ADF to OTBI is the “Composite VO” or “Composite View Object”.  For me, it can generate the database SQL for us based on the metadata.  Unlike the database technology using the database view, ADF engine can look further down to the entity objects included in the view object and selectively choose which entities are needed in a given SQL.  If the view object includes two tables (EOs), one primary EO for data at the line level, and the other EO for getting the parent data, When the query (Composite VO) does not include any column from the parent EO, the SQL generated by ADF will not include the table in the join.

This is a superior technologies, comparing to the old technologies of building the business views.

If you are a Java programmer and would like to get the feeling about what Composite View Object looks like and how it works, here is a good blog post:

Do you know what is a Composite View Object?

2. BI Platform – ADFQuery to Composite VO

This enabling technology is provided by BI platform and available as a Java library. It adds a layer on top of the ADF composite VO.  Without writing the Java code, it generates the codes of creating the composite VO on the fly.  It allows us to query the data from the ADF engine by sending them a XML block called ADFQuery.

This doc shows some of the ADFQuery XML blocks.


To see better examples, you can find them in NQQuery.log files.

It is a query language like SQL.  You have the section  for the column projection, the join criteria using view links, and the filter using view criteria.

Here are other enabling technologies behind OTBI.

3. ADFQuery generation from BI Server

4. SQL By Pass Database

5. Relational to Dimensional Mapping (Physical Layer to Business Model Layer)

6. SELECT Physical in initialization block

7. ADFQuery Initialization block

8. Physical Lookup function from BI platform

9. Logical Lookup function from BI platform

10. Data Security enabled at the VO layer via Fusion AppCore

11. Applcore Tree Flattening

12. Applcore Business Intelligence Column Flatten VO (BICVO)

13. BI Flexfield VO generator

14. BI Extender via Import Wizard

15. BI View Object created based on the BI EE Logical SQL (BIJDBC)

16. Effective Date VO with as of date filter

17. ADF Application Module to BI variable interface

and more…

Regardless, the goal of these technologies is to enable the users to get the real time data access to the Fusion Apps.  There is really little or no much we can do for providing the feature like data snapshot, pre-built aggregate, multiple currencies, data consolidation and conformance, cross subject area analysis, and the most important, the query performance with complexity logic to be available in a reasonable time without the interfere to the transactional system.

Posted in BI Application, Business Intelligence, OTBI | Leave a Comment »