Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Archive for the ‘ETL’ Category

Is ETL still necessary?

Posted by Dylan Wan on January 22, 2019

ETL stands for Extract, Transform, and Load.

Extract and Load, their existence itself implies that the source data and target data are stored separately, so you need to extract from source and load the data into the target data store.

Extract and Load won’t go away if the data used for reporting is not stored in the same place as the data captured in the source system for transactional processing.

However, sometime, we use the term, Replicate, which refers to simply copying the data from the source to the target without any transformation.

In a way, the ETL was not necessary when we said that we can have a reporting solution directly from the transactional system.   In the old day, Oracle DBI running off Oracle E-Business Suite does not require ETL.  It does has batch programs for Summarization, but it does not extract and load data.

ETL went away at once when a replication or backup based approach is being used.  A solution such as Oracle Golden Gate, some how can eliminate the need of Extract and Load since it does Extract and Load in one step.  SOmetime people call this ELT, Extract and Load first, then Transform.

The real question is how easy these tranditional ETL programs can be developed and maintained.

The real question is whether an ETL tool and ETL programs running outside the BI tool is necessary.  The processes, at least the Transform process, won’t go away as long as the data need to prepared for reporting.

Are we able to get rid of the heavy ETL development time? Are we able to get rid of the separate ETL tool?  Are we able to stop maintaining those ETL programs? Definitely.

 

Posted in BI, Business Intelligence, ETL | Leave a Comment »

Use Bit to represent groups

Posted by Dylan Wan on October 11, 2017

Here I am providing an alternate approach of supporting group membership in MySQL.

It is a common seen requirement that a group may have multiple members and a person may be added to multiple groups.  This many to many relationship is typically modeled in an intersection table.

When the group membership is being used as a filter, for example, to show all the students within a group, it becomes a subquery.

Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, ETL | Leave a Comment »

Preserve Surrogate Key During Upgrade

Posted by Dylan Wan on September 23, 2017

The generated surrogate key is used everywhere in the data warehouse.  What do we do during upgrade?

Here are some approaches:

1. Full Refresh

You can perform a full refresh of the data warehouse.  The surrogate keys will be regenerated.  The FK will be updated.

Obviously, this is not a good approach.  There are problems with this approach:

  • Too big impact
  • The historical data may not be available from source
  • Lost the historical data in the data warehouse
    • Type2 records
    • Historical snapshots

2. Partial Refresh – Refresh the impacted fact tables  Read the rest of this entry »

Posted in Data Warehouse, ETL | Tagged: , | Leave a Comment »

How to – Incremental ETL

Posted by Dylan Wan on September 6, 2017

This is a very basic topic.  An ETL 101 question come up a lot in interview.

Even we are moving to a different storage and different processing framework, the concepts are still important.

The idea is simple – you do not need to keep extracting and updating all data in the data store that are used for analytics and reporting.  Only the newly created or changed data are needed.   When the “Delta” arrive, we need to identify what are the impacts to the existing data, which can be either Adding or Modifying.

Deletion is a different topic.  We need to identify what are deleted from the source and we need to reflect the change, which can be either marking  them as deleted or removing it from the data store so when the query is issued, the deleted data won’t appear

Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, ETL | Leave a Comment »

Use Surrogate Key in Data Warehouse

Posted by Dylan Wan on August 31, 2017

Using surrogate key is part of dimensional modeling technique for populating a data warehouse using a relational database.

The original idea was to generate the sequence generated IDs and use them in between the fact and dimension table, so we can avoid using the concatenated string or using composite key to join.  Also, due to the size of the column, it becomes a compression of the data and thus performs better.

However, maintaining the surrogate keys itself has actually a big portion of the ETL process:

Read the rest of this entry »

Posted in BI, Business Intelligence, Data Warehouse, ETL | Leave a Comment »

Using Load Plan for managing your ETL task in BI Apps 11.1.1.7.1 (1)

Posted by Dylan Wan on May 14, 2013

One of the major change introduced in BI Apps 11.1.1.7.1 is the way how we manage the ETL task sequence and trim the unnecessary tasks.

This functionality was accomplished earlier using DAC.  The problem we frequently faced was that the DAC repository and the INFA repository are maintained as two separate repositories.  We have to sync up the name of tasks exactly in order to use DAC to manage the task execution of the Informatica workflow tasks.

Load Plan and Load Plan Generator was designed for addressing this requirement.

Here is a good article that describes the story.

Load Plan Generator – An Inside Look

Posted in BI Application, Business Intelligence, DAC, Data Warehouse, ETL, Infomatica, OBIA, ODI, Oracle, Oracle Data Integrator | Tagged: , , , | Leave a Comment »

Incremental ETL in BI Apps 11.1.1.7.1 (1)

Posted by Dylan Wan on May 10, 2013

The data warehouse needs to be refreshed periodically to make sure that the data warehouse have the up to date information.  The Incremental Load is the regular ETL process running everyday.  The frequency can vary, but it is commonly seen as nightly job.  To ensure that the incremental ETL job can be finished within a limited time window, incremental ETL load typically identifies the changes happened in the source system and only deal with the delta to avoiding doing a Full ETL Load.

In BI Apps 11.1.1.7.1, if you are not leveraging the Golden Gate and the SDS (Source Dependent Data Store), the incremental ETL is accomplished by directly comparing the last update date captured in the source table with the last refreshed date.

BI Apps 11.1.1.7.1 ODI based ETL does not use separate ETL tasks for incremental load and full load.  The logic is determined by whether the data has been executed before.  In other word, it is based on whether the data has been refreshed.  This was a productivity gain for BI App developers and also make the support of incremental ETL becomes clear.  In the past, you may see the Incremental Load Informatica Workflow and Full Load.  Every time when there is a logic change, the workflows need to be modified.  It was a source of code bugs before.  By using a better ETL architecture, we eliminate the needs of maintaining two separate tasks.

Here is a high level summary of how the incremental ETL is accomplished in BI Apps 11.1.1.7.1 ETL:

1. Use the W_ETL_LOAD_DATES table for tracking all the ETL execution

2. Use the IS_INCREMENTAL Variable for holding and passing the state

3. Refresh Variable in a Package Step.  If there is an entry in the W_ETL_LOAD_DATES, perform an incremental ETL.

4. Use the RUN_INCREMENTAL to turn on and off the incremental filter

5. Optionally, use the ODI Variable Evaluate Step to branch the Full and Incremental ETL logic within a package

6. Table Maintenance, take the ETL Load Date information to determine the operations, including Drop Index, Truncate Table before Full Load, Rebuild the index after Load, and Analyze Table after Load, tec.

 

 

 

 

 

 

 

Posted in BI, BI Application, ETL, ODI, Oracle, Oracle Data Integrator | Tagged: , , , | Leave a Comment »

ODI Topology

Posted by Dylan Wan on April 13, 2012

ODI has a very clever design to isolate the deployment objects from design objects.

The designer only interface to deployment information (topology) via logical schema.  The project objects only indirectly get the dependency via model.  The “context” can determine the physical resources and allow you switch among the development, testing and production env.

From prebuilt package apps perspective, the codes are shipped under a set of logical schema.    The logical schema represents the assumption about the various source and target systems.

The concept of “context” really enables the support of unlimited multiple source instances using a single set of codes.

Posted in BI, Business Intelligence, ETL, ODI, Oracle Data Integrator | Tagged: , | Leave a Comment »

OTBI vs. OBIA

Posted by Dylan Wan on October 5, 2011

Several people are curious about what are OTBI and OBIA, and what are the differences between OTBI and OBIA.  I will discuss these in this article.

OTBI stands for Oracle Transactional Buisness Intelligence. 

OBIA stands for Oracle Business Intelligence Applications.

Let’s start with OBIA.  OBIA is the pre-packaged BI Apps that Oracle has provided for several years.  It is the data warehouse based solution.  It is based on the universal data warehouse design with different prebuilt adapters that can connect to various source application to bring the data into the data warehouse.  It allows you to conslidate the data from various sources and bring them together.  It provides a library of metrics that help you measure your business. It also provides a set of predefined reports and dashboards.  OBIA works for multiple sources, including E-Business Suite, PeopleSoft, JDE, SAP, and Fusion Applications.

OTBI is different.  First of all, it is a real time BI.  There is no data warehouse or ETL process for OTBI.  Second, it is for Fusion Apps only.   OTBI is leveraging the advanced technologies from both BI platform and ADF to enable the online BI queries agains the Fusion Applications database directly.  In addition, in some area, such as Financial, you can also connect to the Essbase cubes.  Unlike OBIA, OTBI does not have a lot of prebuilt dashboards and reports.  The reason is that for some advanced analysis, the data need to be prepared.  You cannot get eveything you can get from the OBIA data warehouse in OTBI. 

Both OTBI and OBIA are available from the same metadata repository.  Some of the repository objects are shared between OTBI and OBIA.  It was designed to allow you have the following configurations:

  • OTBI Only
  • OBIA only
  • OTBI and OBIA coexist

If you implement Fusion Apps, you can enable OTBI.  You can use the BI EE Answer to access the prebuild metadata and metrics those are built against the Fusion Apps.  You may not get the full powerful prebuild dashboard and repost and prebuilt navigation workflow.  However, you can start experiencing what the BI EE based reports look like.  You can start bring the data out from your OLTP system.  You can provide training to the users to get familar with the subject areas, some of which are shared with OBIA. 

If you enjoy OTBI and want to further get OBIA with a data warehouse based solution.  You can implement OBIA later.  Some of the OTBI reports maybe switched to run against OBIA.  Some of OTBI reports can continue connecting to Fusion Apps directly.  They can coexist in a single BI server and a single BI answer client.

Both OTBI and OBIA are accessing Fusion Apps via the ADF.   This is a more advanced topic. 

 

 

 

Posted in BI, BI Application, BI Links, Business Intelligence, Data Warehouse, DBI, essbase, ETL, Infomatica, OBIA, OBIEE, Oracle, Oracle BI Suite EE, OTBI | 1 Comment »

User Defined Function (UDF) in ODI

Posted by Dylan Wan on March 21, 2011

User Defined Function (UDF) is a very powerful feature from ODI.

One of features that are absent from other ETL tool is to address the need to support different database platforms.  I won’t blame those ETL tools since they are not really designed for pre-package BI Apps.

Who will need to switch database platform like that?

If your data warehouse is deployed on Oracle, you can use Oracle SQL.  If you are using Teredata, you can use Teradata.  You know that your PeopleSoft is running on DB2, you can write the DB2 SQL.  In the custom data warehouse ETL environment, switching database platforms is uncommon, one time only task.  You do not need to switch among different database platforms within your code.

A prepackaged BI apps ETL developers, however, are facing different challenges.  You do not know if the source apps is running on which database platform.  Also, you want to give customers the choices on the database platforms to deploy the data warehouse.

ODI UDF comes very handy.  You can create a UDF to use in your SQL, you can have multiple implementation of the UDF for different database platform.  You can use GetDate() for MS SQL and use SYSDATE for Oracle database in the implementation, but you can create you own function such TODAY() and use in your SQL.

User Defined Function is not a new idea.  You may see something similar in other tools.  However, to be able to use UDF in SQL and to be able to use UDF with multiple implementations under different technologies, I only see the feature in ODI.

I won’t be surprised to see those “me too” products in the near feature.

More and more companies are moving to use prepackaged BI apps.

Posted in BI, BI Application, BI Work, Business Intelligence, Data Warehouse, ETL, Infomatica, OBIA, ODI, Oracle, Oracle Data Integrator | Tagged: , | 1 Comment »

Why I do not use PowerConnect to access PeopleSoft Tree

Posted by Dylan Wan on March 16, 2011

1. It does not allow you to use parameters to the PeopleSoft connect. It may be changed later. However, it was a big issue when we try to address customer issues.

2. It requires EFFDT as an input.
It expects that people change the EFFDT using Mapping Editor. How can a business user does that every month?

3. It asks for a Tree Name. Many PeopleSoft tree structure supports multiple trees. Tree is just a header of the hierarchy. Whenever you add a new Tree, you need to create a new mapping!!

It does not make sense to use PowerConnect due to the customer demands. All above requirements are from customers.

We have no choice but stop using it.

It is a nice feature, but it was not designed for a prepackaged apps.

Posted in BI, BI Application, Business Intelligence, Data Warehouse, ETL, Infomatica, OBIEE, PeopleSoft | Leave a Comment »

EDW and BI Apps (Part 2)

Posted by Dylan Wan on October 11, 2010

Last week I talked about EDW as a data model offering. I also mentioned that it is possible to use the enterprise data warehouse as the source for BI Apps, but it requires a lot of manual work. The topic discussed is about evaluating if you need to have the EDW data model or BI Apps, or both.

There is another commonly seen scenario that you may already have an existing enterprise data warehouse. If you already have an enterprise data warehouse and already have the integration from various in-house systems, what do you do?

First of all, I think that it is not wrong to co-exist. You have existing investments in your EDW and you should consider keeping it for the value it already provides. However, considering the benefits and the cost and time saving you may get from the prepackaged BI apps, you may still want to deploy a prepackaged BI Apps. There is really no conflict.

The other questions come from those people who think of one plus one should not be two, but more. I think that it is possible to integrate the EDW and prepackaged BI apps in several ways to gain the additional values:

1. Dashboard and report level integration

BI tool, such as Oracle BI EE, allows you to have multiple data sources for your BI. You can put the reports or regions from different data warehouses into the same end user business flow. You can even put them into the same page if it makes sense.

If you include a cross reference table or cross reference from at least one side, you can actually drill into from one to the other.

The integration between the two will be similar from the integration between the BI apps and an OLTP system, such as E-Business Suite. For example, you can navigate to a EBS page as long as the page is callable. You can use URL rewrite to pass the context. The URL can encrypt the identifier so the data can still be secured. You can the URL as an presentation layer attribute that can be a derived attribute that include the object instance ID from the record.

You can also define the page navigation from one BI page to the other.

I will call this loosely-decoupled approach.

2. Logical Layer Integration via Data Federation

We can also use the data federation feature from a BI tool such as Oracle BI EE. The concept of the data federation is very simple. Basically, as an end user of BI, you should not need to know where your data is physically located. Your BI design architect can tell the BI system as part of the metedata repository about where the data is physically located and what the semantic layer of the data mode should be and how the data are related. During run time, the BI tool can get the data from the various physical database systems or even the text files and spreadsheet data. It will merge the results and show the data to the users.

3. Data Warehouse and Database level integration

The data federation is done via the BI tool. There is mot much impact to the ETL process as long as we can identify the share nature key.

The Data warehouse level integration means that you can not only make the BI Apps co-exist with your enterprise data warehouse, you may actually build some integration via the ETL process.

For example, you do not really need duplicate the Date dimension. The Calendar Date is an important dimension in the data warehouse. However, the definition of the Gregorian calendar is defined outside your organizations. Most of data warehouses have the similar design on the Date dimension. If the various facts can get the agreement on how to derive the Date dimension foreign key, you do not really need to have two Date dimension tables.

You need to be careful for going into this approach though. The benefit of using a prepackaged BI apps is not just cut your initial cost, but also reduce your lifetime maintenance. When you upgrade your OLTP apps, or when you want to implement additional modules, it may be the time for you to upgrade your BI Apps. The prepackaged BI Apps can provide the upgraded adapters to the latest version of the OLTP. If you change the out of the box date dimension from the BI Apps, you make need to find a strategy to keep the change isolated from the upgrade to make the change be easily redone or be protected from the upgrade.

(to be continued…)

Posted in BI, BI Application, BI Work, Business Intelligence, Data Warehouse, EBS, ETL, OBIA, OBIEE, Oracle BI Suite EE | Leave a Comment »

Informatica ETL Tuning Steps

Posted by Dylan Wan on March 14, 2009

First, we need to identify where the performance problem come from.   The bottleneck can be:

  • Source Extract
  • Transformation
  • Lookup
  • Target Write

I will talk about how to isolate the problems and go over each about how to investigate the problem in each of these area.

Read the rest of this entry »

Posted in Business Intelligence, ETL, Infomatica, OBIA | 1 Comment »

Use ANSI SQL for Outer Join

Posted by Dylan Wan on July 4, 2008

The OLTP source applications like PeopleSoft and Siebel applications can run on many different databases including Oracle, MS SQL, or DB2. The target data warehouse can also run on different database platforms, incluidng the above databases, plus Teradata.

Various technologies can be used to enable the cross database platform support in the pre-packaged BI apps. Oracle BI Enterprise Edition allows you configure the database connections using the native drivers. ODBC can also be used to access different databases.

In order to deliver to deliver the pre-packaged ETL adapters, two technologies can be used : ANSI SQL and ETL tool specific SQL. It is preferable to avoid the dependency on the ETL plarform and use ANSI SQL92 syntax. When we move from ETL to ELT, Read the rest of this entry »

Posted in BI, BI Application, BI Work, ETL, Infomatica, Oracle, Oracle Data Integrator | Leave a Comment »

Use SQL*Loader with Named Pipe

Posted by Dylan Wan on June 24, 2008

Typically using SQL*Loader assumes that a flat file will be used as the input. The file will need to be created and generated before the SQL*Loader can take the data from the file and load the data into Oracle. The performance can be improved and the disk space can be saved if you use named pipe with SQL*Loader.

Read the rest of this entry »

Posted in BI, BI Work, Business Intelligence, Data Warehouse, ETL, Infomatica, Oracle, Oracle Data Integrator, Sunopsis | Tagged: , , | 5 Comments »

Oracle BI Applications 7.9.5 is out

Posted by Dylan Wan on May 30, 2008

Oracle BI Applications 7.9.5 is released early this month. Here is a quick summary of the features introduced in this release and where you can get more information about it.

Read the rest of this entry »

Posted in BI, BI Application, BI Work, Business Intelligence, Data Warehouse, ETL, Infomatica, OBIA, OBIEE, Oracle, Oracle BI Suite EE, PeopleSoft, Siebel Analytics | Tagged: | 3 Comments »

Key Roles involved in a BI Data Warehouse Project

Posted by Dylan Wan on November 30, 2007

To develop or deploy a BI solution for your organizations, you need to have the right people involved in the time time. Here are typical roles involved in a BI data warehouse project.

  • Project Sponsor
  • Project Manager
  • Functional Analyst
  • SME
  • BI Architect
  • ETL Developers
  • DBA

The job description and responsibilities are listed in this table: Read the rest of this entry »

Posted in BI, BI Application, BI Links, BI Work, Business Intelligence, Data Warehouse, ETL, Infomatica, OBIEE, Oracle Data Integrator, OWB | 5 Comments »

Microsoft acquires MDM vendor Stratature

Posted by Dylan Wan on June 12, 2007

Microsoft recently purchased a Master Data Management vendor Stratature. After I looked at their web site, I guess that Stratature’s offerings seems falling into the BI related MDM , like what Oracle – Hyperion’s product. Their webinar is co-presented with Joy Mundy from the Kimbal Group.

Stratature provides the “Dimension Manager” as the solution for Customer Data Integration. Based on the name of the product, we can also know their solution is BI centric.

Here are several requirements I guessed they tried to target: Read the rest of this entry »

Posted in BI, Business Intelligence, CDI, Customer Data Integration, Data Warehouse, ETL, Master Data Management, MDM | Leave a Comment »

Aggregation for Data Warehouse, Part 3

Posted by Dylan Wan on May 1, 2007

This is my 3rd post related my series of postings on various techniques (see part 1) for providing aggregations in a data warehouse. I mentioned in part 2 that many ETL tools provide the support of creating, populating, and maintaining the aggregation tables. Today, I am describing the 2nd method – manually write the summarization process in a programming language. Read the rest of this entry »

Posted in BI, Business Intelligence, Data Warehouse, ETL | Leave a Comment »

Aggregation for Data Warehouse, Part 2

Posted by Dylan Wan on April 9, 2007

In the aggregation for Data Warehouse, Part 1, I mentioned that there are multiple ways to do the aggregation for data warehouse. One of the method is to use the ETL tool to maintain the aggregated tables. Read the rest of this entry »

Posted in BI, Data Warehouse, ETL, Infomatica, Oracle Data Integrator, OWB, Sunopsis | 1 Comment »