Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Archive for April, 2012

About Big Data (2): SQL and No SQL

Posted by Dylan Wan on April 14, 2012

This is a good article written by Alan Gates, Pig architect at Yahoo!

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines

He compares Pig Latin, a query language for Hive with SQL, the query language for relational database. He gave a good example that helps those who know SQL to understand the differences.

The query language is not difficult to write.  The key point is that it lets the programmer to control the execution plan.  SQL on the other is a higher level abstraction that hides the execution plan from users.  It does not means that we do not really need to be worried about execution plan.  We do.  It is a key for performance tuning.

A good practice for SQL developers is to get the explain plan and to see how Oracle plans to execute the query and optionally use hint and index to control the execution.

Comparing to this, the approach of directly telling the system what to do may seem easier.

But it means that the database system to handle big data does less things, not smarter.


Posted in Big Data, Data Warehouse | Tagged: , , | 1 Comment »

Tutorial: Create ODI Repositories

Posted by Dylan Wan on April 14, 2012

This step is not required if you are working on an existing repository.  If you are doing a new development project, or just want to get some experiences on using ODI, you may need to create ODI repositories yourselves.

A typical ODI deployment includes two repositories: Master Repository and Work Repository.

ODI provides two different tools to create them.

Here is a good tutorial page available from oracle.com:


The tool does not create database users.  Here is my script for creating the database users:

create tablespace ODI_MREP
size 100M autoextend on Next 100M Maxsize unlimited;

create tablespace ODI_WREP
size 200M autoextend on Next 100M Maxsize unlimited;create user ODI_MREP
identified by ODI_MREP
default tablespace ODI_MREP
temporary tablespace TEMP
quota unlimited on ODI_MREP;

create user ODI_WREP
identified by ODI_WREP
default tablespace ODI_WREP
temporary tablespace TEMP
quota unlimited on ODI_WREP;

grant connect, resource, SELECT_CATALOG_ROLE  to ODI_MREP, ODI_WREP;

If you are using RCU, you will not be able to create Work Repository and Master Repository separately.  In my opinion, RCU is a poor tool and lacks many basic features. I will stay away from it.

Posted in Business Intelligence, ODI, Oracle Data Integrator | Leave a Comment »

ODI Topology

Posted by Dylan Wan on April 13, 2012

ODI has a very clever design to isolate the deployment objects from design objects.

The designer only interface to deployment information (topology) via logical schema.  The project objects only indirectly get the dependency via model.  The “context” can determine the physical resources and allow you switch among the development, testing and production env.

From prebuilt package apps perspective, the codes are shipped under a set of logical schema.    The logical schema represents the assumption about the various source and target systems.

The concept of “context” really enables the support of unlimited multiple source instances using a single set of codes.

Posted in BI, Business Intelligence, ETL, ODI, Oracle Data Integrator | Tagged: , | Leave a Comment »