Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Use REST API to get the data

Posted by Dylan Wan on November 26, 2016

We now see more and more data are available in REST API.  Here are some of the data sources I worked earlier: ServiceNow Table API, REST API for Oracle Service Cloud,

Here is a check list to enable a data source for BI:

1. How does the service perform the authentication?

In the case of ServiceNow, they provided a good example:

curl "https://instance.service-now.com/api/now/table/problem?sysparm_limit=1" \
--request GET \
--header "Accept:application/json" \
--user 'username':'password'

The username and password can be passed as part of the request.

In some other system, the authentication is done via a separate login service call, and the login call will return a session id and sometime a cookie.

From the deployment planning perspective, we also need to think of how this user used for accessing the REST service will be created and what permissions should be given.

2. How to get the schema definition?

2.1 A list of tables?

Different applications may organize their data in different ways.  We need to know the schema definitions.  It may have SCHEMA -> TABLE -> COLUMN structure like how the data are organized in a RDBMS, such as Oracle.  In the case of ServiceNow, they just provide a list of table.  For Oracle Service Cloud, previously known as RightNow!, they have a good documentation about a list of objects they support: RightNow Core Objects.

Are custom tables allowed?  How do we get the name of the custom table?  Do you have a separate data dictionary services?

2.2 A list of columns?

Getting the data dictionary for the data is important as you can define the mapping and define the query definitions.  You do not need to bring all the data.  Knowing which data available is important.  We need to define what we need and how to select them or how to filter out the unwanted data.  For example, ServiceNow provides the WSDL supports :

https://<hostname>.service-now.com/incident.do?WSDL

It provides the column and data type of each columns:

Screen Shot 2016-11-26 at 8.06.05 AM.png

2.3 Data type mapping

Some source systems have their own special data type.  A data type mapping may be required for handling the data from the new data source.

2.4 Nested Structure

JSON and XML format allows nested structure.  A transformation service may be required for breaking the data from one “document” to multiple “tables”.  If this type of requirements exist, the data will be staged first before loading.

2.5 Lookup

For example, in Oracle RightNow, they have a special data type call “Menu Type”.  It basically provides the Name to ID conversion. If the service API allows to retrieve both, it may be easier. You can just retrieve both.  If the service API allow to retrieve ID only, you will need to either perform a JOIN when retrieving the data, or pull the lookup data separately.

Sometime the Lookup service is available as a different service call.

2.6 Key of the tables

  • Is there a surrogate key?
  • Can a composite key exist?
  • Do we need to concatenate the key of parent object for uniquely identify a record of the child object?

2.7 Joins

3. Getting the data

Here are some of the considerations:

  • Can we specify the list of attributes?
  • Can we define the filter criteria?
  • Do you have data size limitation?
  • Doe we need to handle pagination?
  • Can we perform incremental query?

 

 

 

 

Posted in Business Intelligence | Leave a Comment »

BI without a data warehouse

Posted by Dylan Wan on September 18, 2016

When we think of a data warehouse, we are typically thinking of having a database that store the data and need to design the schema and the ETL program to populate the database.  If we take out the “database” from the above description, will it still be called a data warehouse?

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

The future of data warehousing

Posted by Dylan Wan on March 17, 2016

Data warehousing is really about preparation of the data for reporting.  The assumption are:

  • You can predicate what typical queries look like to some extent.
  • The data need to be prepared to make the query easier or faster, or make more sense from the data .
  • You know where the data come from and you can Extract from the source
  • You know what the target look like so you can Transform the data
  • You Load the data somewhere so you do not need to query the source directly.

The future of data warehousing is related to whether the above assumptions are still true.  Other factors are relating to technologies and the source data available. Read the rest of this entry »

Posted in Business Intelligence | 1 Comment »

How to support Team based security

Posted by Dylan Wan on December 16, 2015

Team-based security is referring to a specific data security requirement scenario.

Team-based security means that the object being secured has a “team” associated with it.  The team members can access the objects. Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Multiple Tenant Support

Posted by Dylan Wan on November 30, 2015

The definition of multi-tenancy varies.  Some people think that the tenants are data stripping and adding a tenant ID to every table is to support multiple tenants.

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

How to implement Union in ODI and OBIEE?

Posted by Dylan Wan on November 16, 2015

When do you need to use UNION in BI?

You need to use UNION when you would like to combine the data sets as a single data set.  However, when do you exactly need to do this type of thing in BI or in data warehouse ETL?  Here are some real business cases:

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Logical and Physical Schema in ODI

Posted by Dylan Wan on November 10, 2015

ODI Topology allows you to isolate the physical connection and the logical data source by defining the physical schema and logical schema.

This object may be seen as redundant during development.  However, it is a very useful feature for supporting the Test to Production (T2P) process.   Read the rest of this entry »

Posted in BI Application | Tagged: , | Leave a Comment »

Generate Date dimension in PL/SQL Table Function

Posted by Dylan Wan on October 30, 2015

Almost all data warehouse have a date dimension.  The purpose of the date dimension is to provide some pre-calculated grouping for dates.  It helps rolling up the data that entered against dates to a higher level, such as year, quarter, month, week, etc.

In some system, source files are used in generating the date dimension.  IMHO, it makes the modification to the logic difficult.  In some ETL programs, the task involves various table joins, try to generate the rows for the year range.

This post is for describing how to populate a table with rows for each date for a given year range. Read the rest of this entry »

Posted in Common Dimension, Data Warehouse | Tagged: | Leave a Comment »

Use initialization block in OBIEE

Posted by Dylan Wan on October 21, 2015

I just add a post about using initialization block in OBIEE.

It is a feature I felt excited when I first see the tool. Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Cloud BI Features – Amazon QuickSight

Posted by Dylan Wan on October 17, 2015

Here is a list of features available from Amazon QuickSight:

Data Source Connect to supported AWS data sources
Data Source Upload flat files
Data Source Access third-party data sources
Data Preparation Data Preparation Tools
Visualization Build Visualizations
Visualization Access all chart types
Visualization Filter Data
Data Access Capture and Share, Collaborate
Data Access API/ODBC connection to SPICE
Security Encryption at Rest
Security Active Directory Integration
Security Fine-grained User Access Control
Security Enable Audit Logs with AWS CloudTrail
Performance In-memory calculation with SPICE
Performance Scale to thousands of users
Performance Support up to petabytes of data

.

I categorize the features into these groups:

  1. Data Source
  2. Data Preparation
  3. Visualization
  4. Data Access (or Alternate Access)
  5. Security
  6. Performance

They are almost same features available from other BI tools, such OBIEE, except the in-memory engine, and perhaps the scalability.  Here are some questions I have. Read the rest of this entry »

Posted in BI, Business Intelligence | Tagged: , , , , | Leave a Comment »

Data Mashup in OBIEE 12c

Posted by Dylan Wan on October 14, 2015

Data Mashup is a new feature from OBIEE 12c.

It is one of the two main features that OBIEE 12c.  The other one is the visual analyzer.

When I tested the data mashup features, it supports these two scenarios.  Read the rest of this entry »

Posted in BI, Business Intelligence | Tagged: , | Leave a Comment »

Amazon Quick Sight – BI on Cloud?

Posted by Dylan Wan on October 8, 2015

In my post Data Warehouses on Cloud – Amazon Redshift, I mentioned that what would be really useful is providing BI on Cloud, not just Data Warehouse on Cloud.

I felt that BICS makes more sense comparing to Amazon Redshfit.

I discussed with a couple of people last night in a meetup.  Some of them are using Amazon Redshift.  Here are what I heard: Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Do we really need semantic layer from OBIEE?

Posted by Dylan Wan on October 5, 2015

Not all BI tools have the semantic layer.  For example, Oracle Discoverer seems not having a strong semantic layer.

This page summarizes what OBIEE semantic layer can do for you…

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Data Lake vs. Data Warehouse

Posted by Dylan Wan on October 4, 2015

These are different concepts.

Data Lake – Collect data from various sources in a central place.  The data are stored in the original form.  Big data technologies are used and thus the typical data storage is Hadoop HDFS.

Data Warehouse – “Traditional” way of collecting data from various sources for reporting.  The data are consolidated and are integrated.  A data warehouse design that follow the dimensional modeling technique may store data in star schema with fact tables and dimension tables.   Typically a relational database is used.

If we look at the Analytics platform at Ebay from this linkedin slideshare and this 2013 article: Read the rest of this entry »

Posted in Big Data, Data Warehouse, EDW | Tagged: , , | Leave a Comment »

Use DMZ to access BI from outside firewall

Posted by Dylan Wan on October 2, 2015

DMZ is a technology that allows you to configure your network to be accessible outside firewall.

Some of users may want to access some of corporate reports from mobile or from their personal computers.

While VPN and Citrix may be useful for these cases, DMZ can provide another option.

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Lessons Learned – Data Warehouse on Cloud – Amazon Redshift

Posted by Dylan Wan on September 22, 2015

I feel that these are the rules applicable for any cloud based data warehouse solution.  In general, I feel that the on-premise data warehouse deployment probably will remain for a while.

1. For a columnar database, “select *” is bad

I think that the projection needs to be done as early as possible and should be pushed down.

If a column is not needed in the downstream flow, it should not be selected in the first place.

If the application logic is defined in the metadata, the tool should read it and generate the extract logic. Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , | Leave a Comment »

Data Warehouse on Clould – Amazon Redshift

Posted by Dylan Wan on September 17, 2015

Here is a brief summary of what I learned by reading these materials.

1. The data warehouse is stored in clusters

It can support scale out, not scale up.

“Extend the existing data warehouse rather than adding hardware”

2. Use SQL to access the data warehouse

3. Load data from Amazon S3 (Storage Service) using MPP process

4. Partition / Distribute the data by time

The BI team wanted to calculate some expensive analytics on a few years of data, so we just restored a snapshot and added a bunch of nodes for a few days”
Read the rest of this entry »

Posted in Business Intelligence | 1 Comment »

OTBI Enterprise

Posted by Dylan Wan on September 15, 2015

OTBI Enterprise is the BI cloud service, a SaaS deployment of OBIA.  It is using a data warehouse based architecture.  The ETL processes are handled within the cloud.  The data are first loaded from either on premise or cloud sources using various means in the original formats.   The data are first loaded into the SDS schema, then the regular ETL processes that were designed for on premise deployment can be used to populate the data warehouse.  All these processes are supposed to be transparent to the deploying companies.

As of September 2015, OTBI-E for HCM is the only module that is available.  Read the rest of this entry »

Posted in Business Intelligence | Tagged: , , , , , , | Leave a Comment »

Is Apache Spark becoming a DBMS?

Posted by Dylan Wan on September 9, 2015

I attended a great meetup and this is the question I have after the meeting.

Perhaps the intent is to make it like a DBMS, like Oracle, or even a BI platform, like OBIEE?

The task flow it actually very similar to a typical database profiling and data analysis job.

1. Define your question

2. Understand and identify your data

3. Find the approach / model that can be used

Read the rest of this entry »

Posted in Big Data, Business Intelligence | Tagged: , | Leave a Comment »

Schema On Demand and Extensibility

Posted by Dylan Wan on September 3, 2015

Today I see a quite impressive demo in the Global Big Data Conference.

AtScale provides a BI metadata tool for data stored in Hadoop.

At first, I thought that this is just another BI tool that access Hadoop via Hive like what we have in OBIEE.  I heard that that the SQL performance for BI query over Hive could be very slow.   The typical issue is that when the query involves joins, the SQL join may be translated to map /reduce codes by Hive.  Doing the Join in this way may not be as effective as the RDMBS.

However, the concept is actually very different here.  Traditionally ROLAP is built on relational database and we use relational join between the fact table and the dimension table.  When we see the Oracle-acquired tool like Endeca, we already see the data modeling principle changes.  Endeca does not model data in star schema.   It simply denormalizes dimension data into fact table. It can thus run query fast.  AtScale seems doing exactly the same thing.   When the data is stored in the Hadoop cluster, the data is not normalized by separating data into fact and dimension.   It just stores as the data as the source and duplicating the dimension into fact.  There is really no join here.  The closest design technique in OBIEE I can think of is to use degenerated approach.  However, will it work for using Hadoop as a source?

What really impressed me is the concept of Schema on Demand.   I feel that this is actually the major challenge of ROLAP and relational database technology.  When we model the potential additional attributes, we have to add placeholder columns to the relational table.  However, in the data storage / database technology that store attributes as Key Value pairs or as Map, the data do not have to be stored as columns.  This is actually nothing new.  Oracle database has the VARRAY support since Oracle 8.  However, there is no BI tool I am aware of can support this Oracle object type.  While Oracle database has moved to not just supporting relational tables, the BI tool still make the assumption of supporting relational tables only.

It seems that AtScale solved this challenge by generating the metadata that can perform the attribute map to column transformation.  I guess that we will be getting to see these big data technologies start getting into the traditional BI tool space.  It is not due to the 3 Vs nature of the big data, it is due to the flexibility.

Posted in Business Intelligence | Leave a Comment »