Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Archive for the ‘EDW’ Category

Unified Data Model or Not

Posted by Dylan Wan on September 13, 2017

Do we need to store the data all together in same places?

Do we need to use the same data model ?

Do we need to put data into cloud?

Read the rest of this entry »

Posted in CDH, EDW, Master Data Management | Leave a Comment »

Data Lake and Data Warehouse

Posted by Dylan Wan on April 7, 2017

This is an old topic but I learned more and come up more perspectives over time.

  • Raw Data vs Clean Data
  • Metadata
  • What kind of services are required?
  • Data as a Service
  • Analytics as a Service

Read the rest of this entry »

Posted in BI, Big Data, Business Intelligence, Data Lake, Data Warehouse, EDW, Enrichment, Master Data Management | Tagged: , | Leave a Comment »

Schema-less or Schema On Demand

Posted by Dylan Wan on January 29, 2017

I am trying to define what are the criteria for a schema-less, or a schema on demand data storage.

In relational database, we use DDL, data definition language, to define schema.

We have to create table first, before we can insert data into a table.  When we update data, we update the data by mentioning the name of columns.  We use DDL before we use DML.

We have to know the name of columns in order to construct the SQL statement with specific select clause.  If a column does not exist, the system throws an error when we try to query. SELECT * FROM does not have the requirement.  CREATE SELECT also kind of giving us some tolerance, but the table it creates will be unstable.  INSERT.. SELECT * will be a bad practice as when the source schema changes, this statement becomes broken.

Schema describes the name of table, the names and orders of the columns, and the data type (or domain) of the columns.

Here are what I feel possible and something we can pursue: Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, EDW, OBIEE | Leave a Comment »

Data Lake vs. Data Warehouse

Posted by Dylan Wan on October 4, 2015

These are different concepts.

Data Lake – Collect data from various sources in a central place.  The data are stored in the original form.  Big data technologies are used and thus the typical data storage is Hadoop HDFS.

Data Warehouse – “Traditional” way of collecting data from various sources for reporting.  The data are consolidated and are integrated.  A data warehouse design that follow the dimensional modeling technique may store data in star schema with fact tables and dimension tables.   Typically a relational database is used.

If we look at the Analytics platform at Ebay from this linkedin slideshare and this 2013 article: Read the rest of this entry »

Posted in Big Data, Data Warehouse, EDW | Tagged: , , | Leave a Comment »

EDW and BI Apps (Part 3)

Posted by Dylan Wan on November 12, 2010

EDW and BI Apps integration is a fun topic. I heard that more and more organizations are facing this situation. The reason is that many people buy the prepackage BI Application even though they already have an enterprise data warehouse.

I found that an interesting thing is that their existing enterprise data warehouse covers many more subject areas specific to their business, but when they touch the data from the ERP or CRM apps, they still would like to use the prepackaged BI Apps applications. The reason is that it saves a lot of their efforts.

Since BI Applications supports the ERP or CRM apps, their BI Application deployment typically supports the horizontal business functions. On the ERP side, it supports back office operations in financial, procurement, order management, and human resource. Their CRM apps supports marketing, sales, and services. However, the core business system may not be using the prepackaged enterprise apps. The data source for the enterprise data warehouse are industry specific or even in-house built systems.

This leads into the following scenario about integrating the EDW and BI Apps. The integration is really about integrating the Vertical data warehouse with the Horizontal data warehouse. Conformed dimension is a key successful factor for this integration.

There are multiple of technical approaches of doing the integration, such as building a cross reference table or directly sharing the logical or physical layers as I mentioned in the prior posts. No matter which technical approach are taken, I think that they should follow some data warehouse conformance process.

Some people just jump directly into the process of comparing the data warehouse schema. It seems both data warehouse has the party dimension. Let’s merge them. It seems both has the location dimension. Let’s create a cross reference.

I think that it is dangerous to look at the problem in this way. Just because that both data warehouse has something with the name of “Party” does not mean that they are the same thing.

It is important to go through the follow steps if you are involved in a such project:

1. What are the business questions you would like to answer via the conformance and integration?

2. What are the data available in each of your systems?

3. Where are the data required stored?

4. Determine the technical approach to integrate.

Many valuable information you can get from the prepackaged Horizontal BI apps that can be leveraged as part of conformance project:

1. BI Apps collects your people (employee/resource) information from your enterprise apps.
– It may also give your the headcount and reporting structure information.
– The people / resources may have various roles depending on the sets of enterprise apps are deployed.

2. It collects your customer information from your enterprise apps
– It may also provide you the revenue information by major customer related attributes such as geography and industry.
– If financial is being used, you can get the payment and credit information as well.

3. It may have your supplier information if you are using the procurement or Financial payables apps.

4. It has the GL account / Financial reporting structure information
– It already has the cost / expense information collected from various places for accounting

5. It has the internal organization structure information
The org structure defined for business processing as well the org structure defined for reporting / management reporting are there.

6. It has the calendar / fiscal year and quarter definition
If you have the accounting system, the fiscal calendar will be there.

7. It has the product / item information
– It could be the products the organization is selling.
– The items the deploying organization is building
– It can also include the product that the deploying organization is buying.

These of course depend on the nature of the business.

Posted in BI, BI Application, BI Work, Business Intelligence, Data Warehouse, EDW | Leave a Comment »