Dylan’s BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Market Segmentation and Data Mining

Posted by Dylan Wan on April 3, 2015

1. Market Segmentation in the academic world 

Market Segmentation is part of marketing process. It is described in Philip Kotler’s book as part of the step of defining the market strategy.The idea is to segment consumer market by some variables and to divide the market into different segments. Selecting the segments for your products is the result of the marketing strategy.  If you have multiple product lines, this definition process can be done for each product line if not done at the individual product level.

For example, my product X is targeted to sell to the consumers who are women, without kids, living in city, and having more than $90000 income.

This web page includes a very good and concise summary – Market Segmentation and Selection of Target Segments.  It reminded what I learned from my Marketing course in 20 years ago.

2. Marketing Segmentation as a product feature

Once the variables and the segments are defined as part of the marketing strategy, Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , | Leave a Comment »

How to integrate Oracle Data Mining and Oracle BI

Posted by Dylan Wan on April 2, 2015

Here are the various ways that we can use Data Mining inside BI.

We can build Advanced Analytics applications.

The scoring function can be called within the opaque view or with EVALUATE function.

The opaque view method may provide a better flexibility since multiple columns can be exposed.

Here is an old Oracle white paper about how to use EVALUATE in BI Server: white paper

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Data Mining Scoring Engine Development Process

Posted by Dylan Wan on April 1, 2015

Here is how I view data mining:

The target is to build a scoring engine.

It accepts an input and produces the output.

The development process can be separate as Requirement, Design, Coding, and Deploy.  Similar to typical software development phases.

Posted in Business Intelligence | Tagged: , | Leave a Comment »

Why Use Data Mining with Data Warehouse?

Posted by Dylan Wan on April 1, 2015

1. Use the data warehouse data as the training set

Data Mining requires the training data to train the learning algorithm.  The data warehoucing processes provide the following services:

  • Consolidate the data from different sources
  • Aggregate the data: for example, we have the order return transactions but the training data can be # of returns by customers and by products.
  • Capture the historical data – This can be accomplished using the TYPE2 dimension or periodic snapshots. for example, if you are going to do time series analysis, the source data may not keep the history.
  • Data Cleansing:  The quality of the data impacts the quality of the scoring engine. Handling the missing data by setting different default value.
  • Normalize the values, using domain lookup, or transformation logic.  For example, transform the numeric data to categories.
  • Transform the data structure to fit the structure required by data mining models

2. Provide the scoring service as the additional services provided by BI applications

The scoring engine can be deployed as a service.  The service can be provided from the BI and can be embedded in other apps.

For example, a data warehouse may use the historical orders to do the market basket analysis.  The results of the scoring engine needs to deployed in the ecommerce apps, not as BI reports or dashboard.

3. Showing the scoring or the prediction together with the rest of contents

For example, the customer profitability score can be shown wherever the customer data is shown.  The predictive profitability score can help adjust the customer interactions at all layers of the activities.

This can be done at different layer:

a. Run-time scoring:  No ETL process involved, call the scoring API from BI

This depends on the BI platform you are using.  If you are using Oracle BIEE and Oracle Data Mining Option, the opaque view can be used.

b. Scoring as part of the regular ETL process or as a batch process:  we can come up the persistent storage for holding the results of the scoring.  The data will be reflected when the data is refreshed.

Posted in Business Intelligence | Tagged: , , | Leave a Comment »

Business Intelligence Cloud Service – Data Modeler

Posted by Dylan Wan on March 30, 2015

These video shows how the data are loaded to BI Cloud Service and are modeled as dimensions and facts.

We do not need to use the BI admin tool to create model. For BICS, we can create model using the browser.

Posted in Business Intelligence | Tagged: , | Leave a Comment »

Source Dependent Extract and Source Independent Load

Posted by Dylan Wan on March 24, 2015

Typical data warehousing ETL process involves Extract, Transform, and Load.

The concept of Source Dependent Extract (SDE) and Source Independent Load (SIL) are unique part of the BI Apps ETL since BI Apps has a universal data warehouse.

Since the staging schema are designed according to the universal data warehouse design, the logic of loading data from the staging tables to the data warehouse tables are source independent.

Since the staging schema follows the universal data warehouse design, the SDE tasks actually have the specific extraction and mapping logic for each source systems.

Evaluations:

1. Extraction process could impact the performance of the source system since it uses the source system resources.

It typically runs in the limited data warehousing window when the source system does not have much load.

It explains why the change data capture and incremental ETL are important.  We need to use as less resource from source system as possible.

2. The transformation logic could become the overhead.

if the source to target mapping is simply the column mapping, it should not introduce much overhead.  However, if the transformation logic involves heavy calculations, it may become an issue.

It leads the adaption of several technologies:

  • Use Oracle GoldenGate
  • SDS
  • Persistent Staging Tables
  • ODI LKM vs IKM

Posted in Business Intelligence | Leave a Comment »

BI Apps has an Universal Data Warehouse

Posted by Dylan Wan on March 23, 2015

BI Apps data warehouse design is based on an assumption that the data warehouse schema design is independent from OLTP system.

The staging schema is an universal staging and the data warehouse is an universal data warehouse.

The assumption is that no matter what the source system you are using, the business questions the BI consumers have are similar.  You want to see revenue, cost, and profitability, regardless if you are using Oracle E-Business Suite, PeopleSoft, or J.D. Edwards, revenue are still revenue, cost are still cost.  The definitions of these metrics won’t vary too much from the OLTP systems you are using.

It is the BI Apps development team’s job to find where this numbers should be extracted from the source systems and to define the “map” between the source system and the target data warehouse system.

This universal data warehouse assumption does give BI Apps a unique advantage.  It allows the teams develop the source adaptor independently from the data warehouse design and thus independently from the BI metadata.

In another way to view this, the application logics built in the BI tool and the knowledge of designing the data warehouse schema can be reused over time and over many projects.

Adding the support of the 1st adaptor to E-Business Suite may take sometime to analyze the analytic requirements.  Adding the support of the PeopleSoft and JDE adaptors becomes a job of providing the source and target logical mapping and create the ETL jobs of loading the universal staging tables.

Posted in Business Intelligence | Tagged: , , | 1 Comment »

Extracting Data from Cloud Apps

Posted by Dylan Wan on March 5, 2015

I think that it would be easier if the cloud application can be aware of the data integration needs and publish the interfaces proactively.

Here are some basic requirements for the applications that can be considered as data integration friendly:

1. Publish the object data model

This is required for source analysis.

For example, here is the RightNow Object Model.

The documentation should describe the object, the attributes available, the identifiers, the validation rules, the relationships among the objects, including their cardinality, the domain or lookup used in validating the data, etc.

2. Support a scalable solution for the Full Extract

Initial load may require a full extract of the historical transaction data and reference data.  This is typically a one-time only event during implementation phase.

If the extract starts about the same time when the cloud apps is initially implemented, the volume of the initial extract may be similar to incremental extract.  However, if the Cloud BI is implemented much later, after the transational apps, the volume of initial load may be big.

3. Support incremental extract / Change Data Capture

The daily or near real-time extract requires a way to identify the delta, the data created or updated recently.

In many systems, this can be done via an auditing column, such as the last updated date column.  However, if the source object model allows a nested structure.  The last update date may need to be available in two forms, update the header only, or update any objects including the child and the descendant.

4. Delete Capture

Capturing the deleted data during the incremental extract is different from the creation and update cases. If the deleted rows are not extracted, the data will be kept in the data warehouse and be treated as a normal data.  The report could be considered wrong.

In some source systems, deletion and archiving are different events.  We need to have a way to differentiate these two events.

5. Metadata Extract

This is a more advanced feature.  It is required for supporting the custom objects.

In other normal objects, the expected data model is determined during the design time.

However, if the source system allows customization that can add or enable new attributes or even creating new objects, the extraction program needs to be able to detect the model change and react accordingly.

6. Extracting the Lookup or Domain data

Many systems support the multi-language using lookup table.  The idea is that the translated strings are kept in a separate table and allow the data shown on the application UI in different languages depending on the preferred language from the login user.  Not all source system can support this feature.  However, for the source systems that can support this feature, extracting the lookup data can allow the BI showing the data in the same translated as the source system.

The requirement for the extracting process is to extract the data in all languages that are expected in the BI / Data Warehouse side.

7. Extracting data for integrated data security

If the users that can access the cloud source apps and the users that can access BI are the same set of users, or a subset, the user and authentication service should be shared.

However, if the user or the internal person record identifiers are used in building the data security, extracting the users and the roles granted may be necessary.

8. Extracting the UI displayed Labels

In some cases, the cloud BI can be a seamless integrated or even embedded into the source apps.  The displayed lable for the object and for the attributes need to be consistent.  In such cases, extracting the display labels are also requried.

It may be similar to the process of extracting the lookup or domain data which have the data available in multiple languages.

Posted in Business Intelligence | Tagged: , , , , , | Leave a Comment »

Cloud to Cloud ETL

Posted by Dylan Wan on January 16, 2015

I found that Cloud to Cloud ETL is different from the regular ETL running in the on premise env in several ways:

1. Direct Database Access is not available

2. Complex / Nested Objects

3. Paging

4. Volume Restriction

5. File base approach

6. Data Streaming

7. Data Security / Protection

8. Change data capture

9. Translation / System Data Availability

10. Metadata Reverse Engineering

11. API / Interface versioning

12. Frequency of the update / patching

13. Provisioning

Posted in Business Intelligence | Leave a Comment »

BI Applications in Cloud

Posted by Dylan Wan on October 6, 2014

Prepackaged analytics applications are available as cloud services.

The idea is that the client company does not need to use their own hardware and does not need to install the software or apply patches by themselves.

What they need is just simply the browsers.

For the end users, there should not be much difference.   The BI apps built on the OBIEE platform is already a web based application.  Users today, even when they use the BI applications deployed on premise, access the applications via the browser.

The real difference is to the IT department and to the company from the investment perspectives.

The deployment time should be very short.

Where is the role of the ISV, VAR, or SI in this picture?

I think that their role is still very important.

First of all, the out of box reports may not work for everyone.  New reports may need to be created.

Secondly, the source apps may be extended.  For example, oracle cloud services, such as CX clouds, support extensible attributes.  Where those attribute need to be reported should be probably added or hidden.

When the data are not loaded from other Oracle cloud applications, loading the data from Oracle’s  Application Unlimited products such as PeopleSoft, E-Business Suite, Siebel, J.D. Edwards may require a lot of work.

Posted in Business Intelligence | Tagged: , , , | Leave a Comment »

Supply Chain Questions

Posted by Dylan Wan on September 28, 2014

The key question are about inventory and orders.

  • What is our inventory status? by product and by location?
    • Do we have enough inventory on hand and in process to meet customer demand?
    • How can we maximize customer satisfaction while minimizing cost and size of inventory on hand? Both over stocking and under stocking are bad
    • Do we have enough finished goods to our backlog orders?  How much raw materials, work in process, and finished goods on hand?
  • How well is our order fulfillment process performing?
    • What are the top orders in backlog?  How can we prioritize strategic customers?
    • How can we ensure fulfillment on time?
    • Which products have the longest fulfillment cycles?
  • Where are the bottlenecks in order management process?
    • What is our order management cycle time? How long does it take for booking an order?
  • Do we know why customers cancel orders and return products?
    • How much revenue was lost due to order cancellations or returns

Typical dimensions: By Product, By Location, By Customer

Different roles: Director of Sales Operations, Inventory Managers, Order Management Managers

Integration with other systems:

  • Customers and pipeline information from a CRM application
  • Product information from ERP
  • Workforce information from Human Resource
    • Align the training plans to skill gaps in critical product area
      • Correlate staffing levels with order volume and product plan

Posted in BI Application | Tagged: | Leave a Comment »

Analytics 3.0

Posted by Dylan Wan on January 2, 2014

One of the books I like most about data warehousing is the book e-Data, written by Jill Dyche.  Here is a paper she co-authorized with Thomas H. Davenport about Big Data:

Big Data in Big Companies: Executive Summary

Thomas H. Davenport is the author of the book Competing on Analytics, which is also a book I own and found interesting.

I cannot agree more about these two points they mentioned in the above executive summary:

  • Overall, we found the expected co-existence; in not a single one of these large organizations was big data being managed separately from other types of data and analytics. The integration was in fact leading to a new management perspective on analytics, which we’ll call “Analytics 3.0.”
  • “It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products, and services that emerge from analysis.”

I feel that what customers may really need is an integrated prepackaged BI apps that is built on an integrated BI platform and the associated tech stacks that can handle big data together with the other existing enterprise data.  Building an non-integrated big data data warehouse is time consuming and may not get a good balance of the cost and the benefit.  The non integrated big data data warehouse may work well for the specialized firms for special purposes.  However, when we think of leveraging the big data technology in the enterprise, integration with the existing data warehouse is the key for success.

 

Posted in Business Intelligence | 1 Comment »

Data Warehouse for Big Data: Scale-Up vs. Scale-Out

Posted by Dylan Wan on January 1, 2014

Found a very good paper: http://research.microsoft.com/pubs/204499/a20-appuswamy.pdf

This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.

It is hard to argue with the industry trend.  However, Hadoop is not new any more.  It is time for people to calm down and rethink about the real benefits.

 

Posted in Business Intelligence | Leave a Comment »

Technologies behind Oracle Transactional Business Intelligence (OTBI)

Posted by Dylan Wan on July 25, 2013

Oracle Transactional Business Intelligence (OTBI) is one of the business intelligence solutions provided as part of Fusion Applications.

To build a real-time BI, the major challenge is to make sure that it can perform and has no or minimum interfere to the core objective of the transactional application, the online processing.

This is the reason why we need Oracle Business Intelligence Applications (OBIA) for Fusion Applications.  The idea is to keep the minimal processing of detecting changes and capturing changes in the transactional system and leave everything else, such as, preparing and consolidating the data for reporting, to BI Applications.

Here are some of the technologies available to make OTBI possible:

1. SQL Trimming from ADF

ADF stands for Application Development Framework.  It is the application development framework used in developing Fusion Applications.  In general, it is a declarative metadata driven framework to let the application developers to define the data model, define the data access layer, define the UI rendering, put the validation logic and processing in the middle tier.

The underlying data model, in most of cases, is still the relational model defined in the Fusion Apps transactional database under the 3rd NF design.

The key enabling technologies provided from ADF to OTBI is the “Composite VO” or “Composite View Object”.  For me, it can generate the database SQL for us based on the metadata.  Unlike the database technology using the database view, ADF engine can look further down to the entity objects included in the view object and selectively choose which entities are needed in a given SQL.  If the view object includes two tables (EOs), one primary EO for data at the line level, and the other EO for getting the parent data, When the query (Composite VO) does not include any column from the parent EO, the SQL generated by ADF will not include the table in the join.

This is a superior technologies, comparing to the old technologies of building the business views.

If you are a Java programmer and would like to get the feeling about what Composite View Object looks like and how it works, here is a good blog post:

Do you know what is a Composite View Object?

2. BI Platform – ADFQuery to Composite VO

This enabling technology is provided by BI platform and available as a Java library. It adds a layer on top of the ADF composite VO.  Without writing the Java code, it generates the codes of creating the composite VO on the fly.  It allows us to query the data from the ADF engine by sending them a XML block called ADFQuery.

This doc shows some of the ADFQuery XML blocks.

http://docs.oracle.com/cd/E15586_01/fusionapps.1111/e20836/adf.htm#BIEMG3435

To see better examples, you can find them in NQQuery.log files.

It is a query language like SQL.  You have the section  for the column projection, the join criteria using view links, and the filter using view criteria.

Here are other enabling technologies behind OTBI.

3. ADFQuery generation from BI Server

4. SQL By Pass Database

5. Relational to Dimensional Mapping (Physical Layer to Business Model Layer)

6. SELECT Physical in initialization block

7. ADFQuery Initialization block

8. Physical Lookup function from BI platform

9. Logical Lookup function from BI platform

10. Data Security enabled at the VO layer via Fusion AppCore

11. Applcore Tree Flattening

12. Applcore Business Intelligence Column Flatten VO (BICVO)

13. BI Flexfield VO generator

14. BI Extender via Import Wizard

15. BI View Object created based on the BI EE Logical SQL (BIJDBC)

16. Effective Date VO with as of date filter

17. ADF Application Module to BI variable interface

and more…

Regardless, the goal of these technologies is to enable the users to get the real time data access to the Fusion Apps.  There is really little or no much we can do for providing the feature like data snapshot, pre-built aggregate, multiple currencies, data consolidation and conformance, cross subject area analysis, and the most important, the query performance with complexity logic to be available in a reasonable time without the interfere to the transactional system.

Posted in BI Application, Business Intelligence, OTBI | Leave a Comment »

Multiple Language Support in Oracle BI Applications (1)

Posted by Dylan Wan on June 27, 2013

In release 11.1.1.7.1, one of the major feature is to supported the translated data so you can have a global deployment of data warehouse.

In the past, before release 11.1.1.7.1, the BI platform and BI applications support the translated displayed prompts and labels, so the users that use different language can use the BI Apps and see the instruction and navigation and displayed labels in their own language.  However, when they are seeing the data stored in the data warehouse, they are seeing the data in one language only.  Even some of OLTP applications such as E-Business Suite, PeopleSoft, and Siebel Applications are all supported multiple languages in their database, BI Apps only extracted the data from one language that is specified in DAC as a global variable.

In release 11.1.1.7.1, BI Apps data warehouse model design officially deliver the true multiple language data warehouse.  The feature can enable the deploying organizations which have the users from different countries to share a global deployment,  They can enable multiple data warehouse languages as part of the Bi Apps configuration, and the ETL process can extract the language specific reference data from the OLTP applications.

When the different users from different countries login to the same application with the same role, even they have the same data access, they can see the data displayed in their own language.

 

 

Posted in Business Intelligence | Leave a Comment »

Using Load Plan for managing your ETL task in BI Apps 11.1.1.7.1 (1)

Posted by Dylan Wan on May 14, 2013

One of the major change introduced in BI Apps 11.1.1.7.1 is the way how we manage the ETL task sequence and trim the unnecessary tasks.

This functionality was accomplished earlier using DAC.  The problem we frequently faced was that the DAC repository and the INFA repository are maintained as two separate repositories.  We have to sync up the name of tasks exactly in order to use DAC to manage the task execution of the Informatica workflow tasks.

Load Plan and Load Plan Generator was designed for addressing this requirement.

Here is a good article that describes the story.

Load Plan Generator – An Inside Look

Posted in BI Application, Business Intelligence, DAC, Data Warehouse, ETL, Infomatica, OBIA, ODI, Oracle, Oracle Data Integrator | Tagged: , , , | Leave a Comment »

Incremental ETL in BI Apps 11.1.1.7.1 (1)

Posted by Dylan Wan on May 10, 2013

The data warehouse needs to be refreshed periodically to make sure that the data warehouse have the up to date information.  The Incremental Load is the regular ETL process running everyday.  The frequency can vary, but it is commonly seen as nightly job.  To ensure that the incremental ETL job can be finished within a limited time window, incremental ETL load typically identifies the changes happened in the source system and only deal with the delta to avoiding doing a Full ETL Load.

In BI Apps 11.1.1.7.1, if you are not leveraging the Golden Gate and the SDS (Source Dependent Data Store), the incremental ETL is accomplished by directly comparing the last update date captured in the source table with the last refreshed date.

BI Apps 11.1.1.7.1 ODI based ETL does not use separate ETL tasks for incremental load and full load.  The logic is determined by whether the data has been executed before.  In other word, it is based on whether the data has been refreshed.  This was a productivity gain for BI App developers and also make the support of incremental ETL becomes clear.  In the past, you may see the Incremental Load Informatica Workflow and Full Load.  Every time when there is a logic change, the workflows need to be modified.  It was a source of code bugs before.  By using a better ETL architecture, we eliminate the needs of maintaining two separate tasks.

Here is a high level summary of how the incremental ETL is accomplished in BI Apps 11.1.1.7.1 ETL:

1. Use the W_ETL_LOAD_DATES table for tracking all the ETL execution

2. Use the IS_INCREMENTAL Variable for holding and passing the state

3. Refresh Variable in a Package Step.  If there is an entry in the W_ETL_LOAD_DATES, perform an incremental ETL.

4. Use the RUN_INCREMENTAL to turn on and off the incremental filter

5. Optionally, use the ODI Variable Evaluate Step to branch the Full and Incremental ETL logic within a package

6. Table Maintenance, take the ETL Load Date information to determine the operations, including Drop Index, Truncate Table before Full Load, Rebuild the index after Load, and Analyze Table after Load, tec.

 

 

 

 

 

 

 

Posted in BI, BI Application, ETL, ODI, Oracle, Oracle Data Integrator | Tagged: , , , | Leave a Comment »

BI Apps 11.1.1.7.1 (BI Apps in ODI) is available in OTN

Posted by Dylan Wan on May 6, 2013

First Go to OTN

www.oracle.com/technetwork

Go to DOWNLOADS -> Middleware

You will see Business Intelligence Applications under Middleware

You see both 10g and 11g.  Click on Oracle Business Intelligence 11g downloads

You will see BI Apps 11.1.1.7.1 here:

Oracle Business Intelligence Applications (11.1.1.7.1)

 

Posted in Business Intelligence | Leave a Comment »

Benefits of using ODI as the ETL for BI Apps

Posted by Dylan Wan on May 2, 2013

Oracle Data Integrator is being used as the ETL tool for BI Apps. 

Here are some benefits:

1. Productivity

It tasks less efforts for developers to write a ETL map comparing  to other ETL tool.

2. Define ETL flow patterns in KM

The reason why now we can take less effort is that the flow is now inside the knowledge module.  You pick which flow logic you want to use by picking which KM to use.  You configure the flow for your specific requirements by specifying the KM options. The rest is to define the source SQL and the mapping expression.  It pretty much that it is for any typical job.

In a prebuilt ETL adaptor solution, like what BI Apps has, we have a set of patterns.  Unless the specific task requirement is unique, you do not need to write any new KM and any complex logic.

3. Code Generation

ODI provides SDK.  Although the ETL programs themselves are declarative in the ODI repository, the ETL program can be generated by using the ODI SDK.  Not just the ETL flow logic is centralized inside KM, the ETL meta can also be generated.

4. User Defined Function

User Defined Function is one of the magic behind supporting different platforms.  You use the same function and do not need to worry about the different SQL function supported in different database platforms, ODI engine will compile and replace it with different implementation for you.

It is not a required feature for building a custom ETL.  It is a required feature for prebuilt ETL adaptors which need to support source apps, such as Siebel Apps, JDE Apps, PeopleSoft Apps in different database platforms.

5. Data Lineage

The data lineage within ETL tool becomes so easy.  You can easily identify which ETL task populates a given table column, and for a given source table column which ETL tasks are using it as the source.

Actually although it may not be recommended the ODI metadata repository can be queries directly from database. It makes ETL auditing becomes possible.  It is a productivity grain since you can spend less quality assurance time to detect issues and it is a quality gain as you get the ETL adaptor with high quality.

6. Performance

When the source data and target data are within the same database, the middle tier ETL engine becomes an overhead.  We can better leverage the power from the database for ETL transformation.

This is the benefit we expect from a E-L-T solution.

For cross database deliveries, many optimization options can be used.  A simple database link over different database instances can be used.  Also using Oracle Golden Gate to replicate the data is also possible.  With the above benefits of KM and UDF, the same ETL codes can be switched to run the extract process against the replicated data or run against the remote source apps database.

7. Load Plan

The ETL task dependencies and trimming can be accomplished via ODI Load Plan within the same ODI repository.  This architecture design eliminates the needs of duplicating the ETL task metadata to a separate task execution sequencing and management tool, like BI Apps Data Warehouse Administration Console.

You can define your ETL job, execute your ETL job, monitor your ETL jobs within a single tool, the same ODI studio.

(to be continued…)

Posted in Business Intelligence | Tagged: , , , , , , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 124 other followers