Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Is Apache Spark becoming a DBMS?

Posted by Dylan Wan on September 9, 2015

I attended a great meetup and this is the question I have after the meeting.

Perhaps the intent is to make it like a DBMS, like Oracle, or even a BI platform, like OBIEE?

The task flow it actually very similar to a typical database profiling and data analysis job.

1. Define your question

2. Understand and identify your data

3. Find the approach / model that can be used

4. Load data into the database server

Spark can talk with multiple sources.

In OBIEE, we may create connection pools and import tables.

In ODI, we may change the topology and use RKM.

SPARK allows us to load the data from JSON.

Clean your data (not in the demo)

5. Use the tool and …

Perform the tasks using the functions defined in relational algebra:

Project, Join, Select (filter), Union, Minus, Intersect, etc.

Yes, that seems the main jobs.

6. Performance Tuning

You need to do several iterative tasks.

You are not just focus on the business question and how to slice and dice the data.

You need to think of how to partition the data.

This is Spark’s partition, not database partition, but the concept is the same.  You need to think about how the data are stored.

You need to look the execution plan (path) it generates.

Yes, again, similar to what we have been doing for years.

In OBIEE, we look at the generated SQLs from nqquery log.

In ODI, we check the session log and look at the generated SQL

At the Oracle database layer, we generate the explain plan.

Spark has similar logging tools.  Not really a surprise, it has logical plan and physical plan concepts.

One of the technique is to consider create parquet files.

For me, this sounds like the jobs as we create an aggregated table, create a materialized view, or create an index.

7. Visualization

This part was not included in the demo.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s