Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Posts Tagged ‘Data Warehouse’

Generate Date dimension in PL/SQL Table Function

Posted by Dylan Wan on October 30, 2015

Almost all data warehouse have a date dimension.  The purpose of the date dimension is to provide some pre-calculated grouping for dates.  It helps rolling up the data that entered against dates to a higher level, such as year, quarter, month, week, etc.

In some system, source files are used in generating the date dimension.  IMHO, it makes the modification to the logic difficult.  In some ETL programs, the task involves various table joins, try to generate the rows for the year range.

This post is for describing how to populate a table with rows for each date for a given year range. Read the rest of this entry »

Posted in Common Dimension, Data Warehouse | Tagged: | Leave a Comment »

Data Lake vs. Data Warehouse

Posted by Dylan Wan on October 4, 2015

These are different concepts.

Data Lake – Collect data from various sources in a central place.  The data are stored in the original form.  Big data technologies are used and thus the typical data storage is Hadoop HDFS.

Data Warehouse – “Traditional” way of collecting data from various sources for reporting.  The data are consolidated and are integrated.  A data warehouse design that follow the dimensional modeling technique may store data in star schema with fact tables and dimension tables.   Typically a relational database is used.

If we look at the Analytics platform at Ebay from this linkedin slideshare and this 2013 article: Read the rest of this entry »

Posted in Big Data, Data Warehouse, EDW | Tagged: , , | Leave a Comment »

Lessons Learned – Data Warehouse on Cloud – Amazon Redshift

Posted by Dylan Wan on September 22, 2015

I feel that these are the rules applicable for any cloud based data warehouse solution.  In general, I feel that the on-premise data warehouse deployment probably will remain for a while.

1. For a columnar database, “select *” is bad

I think that the projection needs to be done as early as possible and should be pushed down.

If a column is not needed in the downstream flow, it should not be selected in the first place.

If the application logic is defined in the metadata, the tool should read it and generate the extract logic. Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

BI Apps has an Universal Data Warehouse

Posted by Dylan Wan on March 23, 2015

BI Apps data warehouse design is based on an assumption that the data warehouse schema design is independent from OLTP system.

The staging schema is an universal staging and the data warehouse is an universal data warehouse.

The assumption is that no matter what the source system you are using, the business questions the BI consumers have are similar.  You want to see revenue, cost, and profitability, regardless if you are using Oracle E-Business Suite, PeopleSoft, or J.D. Edwards, revenue are still revenue, cost are still cost.  The definitions of these metrics won’t vary too much from the OLTP systems you are using.

It is the BI Apps development team’s job to find where this numbers should be extracted from the source systems and to define the “map” between the source system and the target data warehouse system.

This universal data warehouse assumption does give BI Apps a unique advantage.  It allows the teams develop the source adaptor independently from the data warehouse design and thus independently from the BI metadata.

In another way to view this, the application logics built in the BI tool and the knowledge of designing the data warehouse schema can be reused over time and over many projects.

Adding the support of the 1st adaptor to E-Business Suite may take sometime to analyze the analytic requirements.  Adding the support of the PeopleSoft and JDE adaptors becomes a job of providing the source and target logical mapping and create the ETL jobs of loading the universal staging tables.

Posted in Business Intelligence | Tagged: , , | 1 Comment »

Extracting Data from Cloud Apps

Posted by Dylan Wan on March 5, 2015

I think that it would be easier if the cloud application can be aware of the data integration needs and publish the interfaces proactively.

Here are some basic requirements for the applications that can be considered as data integration friendly:

1. Publish the object data model

This is required for source analysis.

For example, here is the RightNow Object Model.

The documentation should describe the object, the attributes available, the identifiers, the validation rules, the relationships among the objects, including their cardinality, the domain or lookup used in validating the data, etc.

2. Support a scalable solution for the Full Extract

Initial load may require a full extract of the historical transaction data and reference data.  This is typically a one-time only event during implementation phase.

If the extract starts about the same time when the cloud apps is initially implemented, the volume of the initial extract may be similar to incremental extract.  However, if the Cloud BI is implemented much later, after the transational apps, the volume of initial load may be big.

3. Support incremental extract / Change Data Capture

The daily or near real-time extract requires a way to identify the delta, the data created or updated recently.

In many systems, this can be done via an auditing column, such as the last updated date column.  However, if the source object model allows a nested structure.  The last update date may need to be available in two forms, update the header only, or update any objects including the child and the descendant.

4. Delete Capture

Capturing the deleted data during the incremental extract is different from the creation and update cases. If the deleted rows are not extracted, the data will be kept in the data warehouse and be treated as a normal data.  The report could be considered wrong.

In some source systems, deletion and archiving are different events.  We need to have a way to differentiate these two events.

5. Metadata Extract

This is a more advanced feature.  It is required for supporting the custom objects.

In other normal objects, the expected data model is determined during the design time.

However, if the source system allows customization that can add or enable new attributes or even creating new objects, the extraction program needs to be able to detect the model change and react accordingly.

6. Extracting the Lookup or Domain data

Many systems support the multi-language using lookup table.  The idea is that the translated strings are kept in a separate table and allow the data shown on the application UI in different languages depending on the preferred language from the login user.  Not all source system can support this feature.  However, for the source systems that can support this feature, extracting the lookup data can allow the BI showing the data in the same translated as the source system.

The requirement for the extracting process is to extract the data in all languages that are expected in the BI / Data Warehouse side.

7. Extracting data for integrated data security

If the users that can access the cloud source apps and the users that can access BI are the same set of users, or a subset, the user and authentication service should be shared.

However, if the user or the internal person record identifiers are used in building the data security, extracting the users and the roles granted may be necessary.

8. Extracting the UI displayed Labels

In some cases, the cloud BI can be a seamless integrated or even embedded into the source apps.  The displayed lable for the object and for the attributes need to be consistent.  In such cases, extracting the display labels are also requried.

It may be similar to the process of extracting the lookup or domain data which have the data available in multiple languages.

Posted in Business Intelligence | Tagged: , , , , , | Leave a Comment »

BI Applications in Cloud

Posted by Dylan Wan on October 6, 2014

Prepackaged analytics applications are available as cloud services.

The idea is that the client company does not need to use their own hardware and does not need to install the software or apply patches by themselves.

What they need is just simply the browsers.

For the end users, there should not be much difference.   The BI apps built on the OBIEE platform is already a web based application.  Users today, even when they use the BI applications deployed on premise, access the applications via the browser.

The real difference is to the IT department and to the company from the investment perspectives.

The deployment time should be very short.

Where is the role of the ISV, VAR, or SI in this picture?

I think that their role is still very important.

First of all, the out of box reports may not work for everyone.  New reports may need to be created.

Secondly, the source apps may be extended.  For example, oracle cloud services, such as CX clouds, support extensible attributes.  Where those attribute need to be reported should be probably added or hidden.

When the data are not loaded from other Oracle cloud applications, loading the data from Oracle’s  Application Unlimited products such as PeopleSoft, E-Business Suite, Siebel, J.D. Edwards may require a lot of work.

Posted in Business Intelligence | Tagged: , , , | Leave a Comment »

BI Applications and Embedded BI, Part 2

Posted by Dylan Wan on December 5, 2007

This is a topic I wrote in six month ago. In the Part I of this series of articles, I mentioned that a warehouse like architecture is required in a heterogeneous environment. I want to elaborate more about this. In the future posts, I will also describe the integration technology I learned for supporting the embedded BI.

Read the rest of this entry »

Posted in BI, BI Work, Business Intelligence, Data Warehouse, DBI, hyperion, Oracle, PeopleSoft, Siebel Analytics | Tagged: , , | Leave a Comment »