Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Elastic Stack

Posted by Dylan Wan on January 24, 2017

This post comes from notes I took while learning Elastic Stack.

ELK is a combination of three tools:

  • Elastic Search
  • Logstash
  • Kibana

When ELK is changed to Elastic Stack, there is a forth tool:

  • Beat

There are a lot of information on the net.  I do not want to repeat anything.  I will just write my impressions.

In this Elastic world, data are represented, passed, and stored in JSON format.  In our relational world, data are represented, passed, and stored in tabular format.  Data in the tabular format can also be easily represented in the JSON format, but not the other way.  Data stored in XML or JSON can be nested and to store the normalized format in the relational database, the information about the relationships are lost or only exists in the form of metadata and the keys, PK and FK.

Elastic Search

Elastic Search is the engine in the picture.  However, the real engine is Apache Lucene.  Elastic Search was built on the top of Lucene by adding two things:  distributed processing and RESTful APIs.  Elastic Search / Lucene is the database (index) engine, which organizes the data and process the data like the CRUD (Create, Read, Update, Delete) operations in relational database.

We use SQL for CRUD in relational database. We will use the query language in RESTful API for similar purposes.  To use Elastic Search, you just need to put the data in and you can get the data out by searching.  Knowing how it works internally should help for optimizing and provide a scalable solution, but may not be required.

Logstash

Logstash was introduced to me as an ETL tool.  However, my impression is that it is a specific, not a generic, transformation tool for handling the log information. It provides the specific capability to parse the log files.  They call this “grok”.  The technology is built on the top of  regular expression.  Other impressive features (plugins) are “geoip”, which is an enrichment service, and “date” processor, which can parse many if not any date formats.

The beauty of this tool is extensible.  Like ODI, which can be extended by adding Knowledge Module, Logstash can be extended by adding Plugins. R is also benefited from the R packages contributed from the community.

Kibana

Kibana is the BI/visualization tool in the picture. It issues the queries to the engine (elastic search), get the data and display the data in charts or tables.

It seems that it emphasizes on time series analysis as it is commonly used in log analysis.  Otherwise, most of charts and tables are just like other similar tools.

Beats

These are source side agents for collecting information.  It extracts the data on the source and deliver the data to the target.

 

 

 

 

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s