Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Archive for January, 2017

Schema-less or Schema On Demand

Posted by Dylan Wan on January 29, 2017

I am trying to define what are the criteria for a schema-less, or a schema on demand data storage.

In relational database, we use DDL, data definition language, to define schema.

We have to create table first, before we can insert data into a table.  When we update data, we update the data by mentioning the name of columns.  We use DDL before we use DML.

We have to know the name of columns in order to construct the SQL statement with specific select clause.  If a column does not exist, the system throws an error when we try to query. SELECT * FROM does not have the requirement.  CREATE SELECT also kind of giving us some tolerance, but the table it creates will be unstable.  INSERT.. SELECT * will be a bad practice as when the source schema changes, this statement becomes broken.

Schema describes the name of table, the names and orders of the columns, and the data type (or domain) of the columns.

Here are what I feel possible and something we can pursue: Read the rest of this entry »

Posted in Data Warehouse, Business Intelligence, OBIEE, EDW | Leave a Comment »

Uncommonly Common

Posted by Dylan Wan on January 26, 2017

An interesting concept.

Significant Terms Aggregation – Elastic Search

Posted in Business Intelligence | Leave a Comment »

Elastic Stack

Posted by Dylan Wan on January 24, 2017

This post comes from notes I took while learning Elastic Stack.

ELK is a combination of three tools:

  • Elastic Search
  • Logstash
  • Kibana

When ELK is changed to Elastic Stack, there is a forth tool:

  • Beat

There are a lot of information on the net.  I do not want to repeat anything.  I will just write my impressions.

In this Elastic world, data are represented, passed, and stored in JSON format.  In our relational world, data are represented, passed, and stored in tabular format.  Data in the tabular format can also be easily represented in the JSON format, but not the other way.  Data stored in XML or JSON can be nested and to store the normalized format in the relational database, the information about the relationships are lost or only exists in the form of metadata and the keys, PK and FK.

Elastic Search

Elastic Search is the engine in the picture.  However, the real engine is Apache Lucene.  Elastic Search was built on the top of Lucene by adding two things:  distributed processing and RESTful APIs.  Elastic Search / Lucene is the database (index) engine, which organizes the data and process the data like the CRUD (Create, Read, Update, Delete) operations in relational database.

We use SQL for CRUD in relational database. We will use the query language in RESTful API for similar purposes.  To use Elastic Search, you just need to put the data in and you can get the data out by searching.  Knowing how it works internally should help for optimizing and provide a scalable solution, but may not be required.

Logstash

Logstash was introduced to me as an ETL tool.  However, my impression is that it is a specific, not a generic, transformation tool for handling the log information. It provides the specific capability to parse the log files.  They call this “grok”.  The technology is built on the top of  regular expression.  Other impressive features (plugins) are “geoip”, which is an enrichment service, and “date” processor, which can parse many if not any date formats.

The beauty of this tool is extensible.  Like ODI, which can be extended by adding Knowledge Module, Logstash can be extended by adding Plugins. R is also benefited from the R packages contributed from the community.

Kibana

Kibana is the BI/visualization tool in the picture. It issues the queries to the engine (elastic search), get the data and display the data in charts or tables.

It seems that it emphasizes on time series analysis as it is commonly used in log analysis.  Otherwise, most of charts and tables are just like other similar tools.

Beats

These are source side agents for collecting information.  It extracts the data on the source and deliver the data to the target.

 

 

 

 

Posted in Business Intelligence | Leave a Comment »

HR Reporting

Posted by Dylan Wan on January 16, 2017

HR is an important business function in any company.  It supports business growth.  It also satisfies the compliance requirements.  Reports are required for supporting the HR functions.

Typical HR Reporting Requirements

HR reports are surrounding the HR function:  Human Resource Planning, Recruitment and Selection, Orientation and Training, Performance Appraisal, Compensation and Benefit.

They serves many different roles and thus having different ways to slice and dice the data.

  • Personal Information: Demographics, Education, Job, Location, Manager, Shift, Salaries, Payroll, Skills, Benefit
  • Personal Information Changes – Before and After for audit and review
  • New Hire – Who are they?  Where we are in the hiring status?  When they will start?  Temporary or Employment?  Full Time or Part Time, New Hire or Rehire? Have they attended the orientation?   Are they getting sign-on Bonus?  Are they getting stock options?
  • Allowance – Who got the allowance?  How much?  How it was calculated?
  • Transfer – internal transfers, what is the releasing org? which org the person is moving to?
  • Assignment – Global, special assignments, start date, end date, purpose, etc.
  • Training – Who take training?  What training? Compliance Status
  • Termination – Who have left?  Who is leaving?  When?  Last Payroll date?

Challenges

I found that HR reporting is somewhat challenging:

Read the rest of this entry »

Posted in BI Application, Business Intelligence, Human Resource Intelligence, Incorta | Leave a Comment »