May 2024
S	M	T	W	T	F	S
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Archive for the ‘Business Intelligence’ Category

Predictive Analytics and AI/ML

Posted by Dylan Wan on February 25, 2024

We have heard about advanced analytics, which was described by dividing analytics into three main types:

Descriptive
Predictive
Prescriptive

While many articles explore these concepts,, such as Prescriptive vs. Predictive Analytics: Examples & Use Cases

I aim to approach these categories from the standpoint of a software vendor, describe how these categories relate to machine learning practices.

Read the rest of this entry »

Posted in Business Intelligence, Incorta | Tagged: Advanced Analytics, ai, artificial-intelligence, Data Science, data-analytics | Leave a Comment »

Getting Data into Cloud

Posted by Dylan Wan on August 27, 2021

When I worked on the data warehousing technologies, we extract the data from the source. The “Extract” is the first step in ETL (or ELT). The extraction was typically done by using SQL connection to the database that holds the transactional data.

When we start introducing cloud based storage, or the Data Lake, many of the process is done via “Data Ingestion”.

The real difference between “Data Extraction” and “Data Ingestion” is that when we extract the data using SQL, we typically know what the data look like. The structure of the data is known and thus we can design the target placeholder, which was called Staging tables, Tables in the data warehouse, or the operational data store (ODS) before we write the code to extract the data. When we perform data ingestion, we should not need to know what the source structure looks like and we definitely do not have to design the target for holding the data until later when we need to use the data. The design and development phase can be shorten or deferred.

When a cloud service cannot handle supporting the schema inference and support getting insights from the data with unknown structure or the structure that may change over time, the intelligence will have to put into the extraction or ingestion layer.

The intelligent ingestion means that even though the source structure is complex, the complex data model has been already handled and the structured data are delivered to the destination.

—

How is this different from the prebuilt BI Apps?

Prebuilt Apps provides the end to end, not just the extraction or transformation, or the target schema design, but prebuilt libraries of metrics, analytics workflow, prebuilt dashboards, etc.

By providing the intelligent ingestion, the data is there and you can still building the analytics application yoursleves.

Posted in Business Intelligence | Leave a Comment »

Time Series

Posted by Dylan Wan on March 12, 2021

Times Series is defined as a series of data, typically values of a variables, the value of which may change over time.

A set of statistical methods were developed for analyzing such data.

Those methods help to understand and interpret the data, and once the data can be understood, then the model can be used for forecasting.

The assumption of time series are including these components: (https://medium.com/high-data-stories/time-series-patterns-600b03d13903)

Trend
Pattern – sometime this is called seasonality
Cycle
Random Error

On the business site, many data are time series, but we don’t typically use those time series methods to analyze those data. Here are typical ways that analytics application dashboard and data model designs about time series:

Read the rest of this entry »

Posted in Business Intelligence | Tagged: Incorta, pandas, pyspark, times series | Leave a Comment »

Oracle App Cloud and Incorta

Posted by Dylan Wan on September 30, 2020

OTBI is great. But when people are migrating from Oracle EBS to Oracle Cloud App, they would like to view the data from both EBS and Oracle Cloud, Incorta becomes a cost saving and a quick implementation solution without implementing a data warehouse.

Incorta is not a data warehouse although it does has the data storage, which is optimized for extraction, optimized for internal process reading and transformation, and optimized for analytical queries.

The scalability is accomplished by a distributed architecture, and thus no issue with resource contention.

The support of Tableau and PowerBI provides an alternatives for those companies that already standardized their BI front-end.

Very interested to see how Incorta can help upsell Oracle Cloud App, not the other way around.

Posted in Business Intelligence, EBS, Human Resource Intelligence, Incorta, Oracle | Leave a Comment »

Scalable Distributed BI Architecture

Posted by Dylan Wan on November 26, 2019

I have not written anything for almost a year. I think that one of the major change is that I am dealing with a distributed BI architecture.

There is not much extra work for a solution architect to do in a distributed BI architecture. Design of schema and dashboards are the same. The metadata are the same. The challenge is all on the system, but supporting it including debugging the issues is new. Understanding how it works itself is interesting.

High Availability – The BI system needs to be always available. If a machine is down, can the site be still up? For the front-end UI, we need to have a load balancer that is the site accessed by users. We can have multiple nodes that serve requests from the users, but which node is being used is random, or can be configured based on rules. Having HA means that the nodes involved are providing the same services so they can replace each other. Even one node failed, the other node can still take care of new requests.

Disaster Recovery – A stable BI system needs to support recovery. DR is about resuming the operation from a disaster. It is different from HA in that the nodes in HA are serving requests. DR site is a backup and may be passive, not active. DR site typically has some distance from the main site so it won’t experience the same disaster.

Horizontal scaling – Achieving horizontal scaling may also be involved in multiple distributed nodes but it is a bit different. This is about reacting to the growth of your organization and thus the growth of the data. You can add more nodes to serve new request is horizontal scaling, but having nodes that are providing the same servicing without distinguishing the existing and new can be more desirable for achieving horizontal scaling.

Separation of Data Preparation/Integration and Data Request – Many conventional BI and ETL systems are separate by nature. But breaking a system that has been originally designed to be one JVM and one process to be separate but integrated systems is equally challenge. Data consistency requires that the data that are updated not available unless other integrated parts are also available. Latency is OK but inconsistent is not OK. While the data is being updated, the users who rely on the data to make decision should not be affected, until the last moment that the new consistent data becomes available.

Technologies

Zookeeper – I don’t want to use the term database to describe it but it is indeed a database that store a specific type of data – the messages between nodes.

Apache Helix – A cluster management A good slideshare.

Shared Metadata Database

Shared File System

An enterprise level BI architecture is not just provide instant queries and not just about providing the minimal latency. It is about highly available, supporting disaster recovery, and being scalable over time. Different types of challenges indicate how the deploying company is relying on the system and reflects the stage of the product and company in its lifecycle.

Posted in BI, Business Intelligence, Incorta | Leave a Comment »

Is ETL still necessary?

Posted by Dylan Wan on January 22, 2019

ETL stands for Extract, Transform, and Load.

Extract and Load, their existence itself implies that the source data and target data are stored separately, so you need to extract from source and load the data into the target data store.

Extract and Load won’t go away if the data used for reporting is not stored in the same place as the data captured in the source system for transactional processing.

However, sometime, we use the term, Replicate, which refers to simply copying the data from the source to the target without any transformation.

In a way, the ETL was not necessary when we said that we can have a reporting solution directly from the transactional system. In the old day, Oracle DBI running off Oracle E-Business Suite does not require ETL. It does has batch programs for Summarization, but it does not extract and load data.

ETL went away at once when a replication or backup based approach is being used. A solution such as Oracle Golden Gate, some how can eliminate the need of Extract and Load since it does Extract and Load in one step. SOmetime people call this ELT, Extract and Load first, then Transform.

The real question is how easy these tranditional ETL programs can be developed and maintained.

The real question is whether an ETL tool and ETL programs running outside the BI tool is necessary. The processes, at least the Transform process, won’t go away as long as the data need to prepared for reporting.

Are we able to get rid of the heavy ETL development time? Are we able to get rid of the separate ETL tool? Are we able to stop maintaining those ETL programs? Definitely.

Posted in BI, Business Intelligence, ETL | Leave a Comment »

Is Star Schema necessary?

Posted by Dylan Wan on January 18, 2019

A star schema describes the data by fact and dimension.

From one angle, it is a data modeling technique for designing the data warehouse based on relational database technology. In the old OLAP world, even though a cube is also links to the dimensions that describe the measure, we typically won’t call them Star Schema.

A star schema design based on an assumption that joins can be expensive so the dimension tables can be created by denormalizing the data from parents. It is also based on the assumption that most of business questions are like “How much sales we did this period, by sales, by product, by region, by customer categories, by sales channels”, slice the data by those elements that we can track the performance and apply actions for. On the technical side, it also assumes that without duplicating the dimensions to each measure, via the normalized relationship between the fact and dimension, the performance can be optimized by such as bitmap indexes, by prebuilt aggregates, and also the application of common dimensions that can summarize the data across subject areas are possible,

So, is star schema design necessary?

If data storage and query generation changed and we do not use relational database, I feel that the way of following Star Schema design to prepare the data is definitely questionable.

On one extreme, if you don’t have to aggregate data since you just want to view the data in a list report and you don’t have the requirement of viewing different subject areas side by side. A fully denormalized structure may be ideal.

Will the way of describing business questions by using the term “dimension” and “fact” goes away?

For the time being, when the professional and functional people who live in the traditional BI world are still in place, I think that the separation of data into “dimension”, “fact” will be alive for a while, until better terms are invented to describe the same concept. For example, I feel that I start using more “grouping attribute” than “dimension”.

Posted in Business Intelligence | Leave a Comment »

Incremental ETL – The last refresh date

Posted by Dylan Wan on October 11, 2017

There are multiple ways to model the last refresh date.

In OBIA, DAC and Informatica based ETL, the last refresh date is maintained within DAC. It is maintained at the level of the source tables that populates the data.

Oracle BI DAC User Guide > About Refresh Dates and DAC’s Incremental Load Strategy

In OBIA ODI based ETL, the “Last Extract Date” is maintained in an ETL_LOAD_DATE table at the interface package level.

Oracle BI Applications ETL Guide > ETL Overview > About Changed Data Capture

There are other simpler ways: Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse | Leave a Comment »

Use Bit to represent groups

Posted by Dylan Wan on October 11, 2017

Here I am providing an alternate approach of supporting group membership in MySQL.

It is a common seen requirement that a group may have multiple members and a person may be added to multiple groups. This many to many relationship is typically modeled in an intersection table.

When the group membership is being used as a filter, for example, to show all the students within a group, it becomes a subquery.

Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, ETL | Leave a Comment »

How to – Incremental ETL

Posted by Dylan Wan on September 6, 2017

This is a very basic topic. An ETL 101 question come up a lot in interview.

Even we are moving to a different storage and different processing framework, the concepts are still important.

The idea is simple – you do not need to keep extracting and updating all data in the data store that are used for analytics and reporting. Only the newly created or changed data are needed. When the “Delta” arrive, we need to identify what are the impacts to the existing data, which can be either Adding or Modifying.

Deletion is a different topic. We need to identify what are deleted from the source and we need to reflect the change, which can be either marking them as deleted or removing it from the data store so when the query is issued, the deleted data won’t appear

Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, ETL | Leave a Comment »

Use Surrogate Key in Data Warehouse

Posted by Dylan Wan on August 31, 2017

Using surrogate key is part of dimensional modeling technique for populating a data warehouse using a relational database.

The original idea was to generate the sequence generated IDs and use them in between the fact and dimension table, so we can avoid using the concatenated string or using composite key to join. Also, due to the size of the column, it becomes a compression of the data and thus performs better.

However, maintaining the surrogate keys itself has actually a big portion of the ETL process:

Read the rest of this entry »

Posted in BI, Business Intelligence, Data Warehouse, ETL | Leave a Comment »

Prebuilt BI Contents should replace BI Tools

Posted by Dylan Wan on August 13, 2017

Most school districts need the same kind of reports and dashboard for measuring the performance of students, teachers, and schools. They do not really need to have IT to build reports for them if the vendors can provide the reports OOTB.

There is really hardly a need to have a custom reporting tool for building or customizing the reports if the OOTB reports do the jobs. Read the rest of this entry »

Posted in BI, BI Application, Business Intelligence | Tagged: Prebuilt content | Leave a Comment »

Users of Analytics Applications

Posted by Dylan Wan on May 21, 2017

Business User who are consuming the data and the report. They see the information pushed to them. They can see alerts in their phone. They see emails. They add the page to a bookmark in their browser and periodically look at them. They are executives, managers, busy users who have other duties. They don’t spend much time. They may not come to see the data every day. In a school system, they are teachers, principals, vice principals, school counselors. In an corporation, they are the business partners of IT. They are controllers, account payables, contractor admin, buyers, billing clerks. Many of them know what information they would like to have and information is essential for their jobs. They do not like complex navigation. They would like to see dashboards or data integrated into their portal or the applications they have to use. Read the rest of this entry »

Posted in BI, Business Intelligence, Incorta, OBIEE | Leave a Comment »

Data Lake and Data Warehouse

Posted by Dylan Wan on April 7, 2017

This is an old topic but I learned more and come up more perspectives over time.

Raw Data vs Clean Data
Metadata
What kind of services are required?
Data as a Service
Analytics as a Service

Read the rest of this entry »

Posted in BI, Big Data, Business Intelligence, Data Lake, Data Warehouse, EDW, Enrichment, Master Data Management | Tagged: Data Lake, Data Warehouse | Leave a Comment »

RTI and Actionable BI

Posted by Dylan Wan on April 1, 2017

RTI stands for Response to Intervention. It is one the Data Driven Instruction methods. It is a way to monitor students’ progress and to determine what interventions students may need, assign the appropriate intervention to them, and monitor the progress closely.

This is a good demonstration of the need of actionable BI. The system collects the information about students and identify the students that need the interventions. The information may be learning related or behavior related problems. Read the rest of this entry »

Posted in BI, Business Intelligence | Tagged: actionable | Leave a Comment »

Analytics apps or embedded analytics

Posted by Dylan Wan on February 16, 2017

An analytics apps should be an application. We should not see it as a report solution or just a way to view or download data.

A best analytics apps is an apps, and lets people forget about it is an “analytics” apps.

If you keep the end user in mind and drive the design of the apps using user persona and knowing really what users do, the line between analytics apps and other apps should not be very clear.

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Dashboard vs Report

Posted by Dylan Wan on February 16, 2017

A dashboard is a data driven application. It is important to think about:

Read the rest of this entry »

Posted in Business Intelligence | Leave a Comment »

Schema-less or Schema On Demand

Posted by Dylan Wan on January 29, 2017

I am trying to define what are the criteria for a schema-less, or a schema on demand data storage.

In relational database, we use DDL, data definition language, to define schema.

We have to create table first, before we can insert data into a table. When we update data, we update the data by mentioning the name of columns. We use DDL before we use DML.

We have to know the name of columns in order to construct the SQL statement with specific select clause. If a column does not exist, the system throws an error when we try to query. SELECT * FROM does not have the requirement. CREATE SELECT also kind of giving us some tolerance, but the table it creates will be unstable. INSERT.. SELECT * will be a bad practice as when the source schema changes, this statement becomes broken.

Schema describes the name of table, the names and orders of the columns, and the data type (or domain) of the columns.

Here are what I feel possible and something we can pursue: Read the rest of this entry »

Posted in Business Intelligence, Data Warehouse, EDW, OBIEE | Leave a Comment »

Uncommonly Common

Posted by Dylan Wan on January 26, 2017

An interesting concept.

Significant Terms Aggregation – Elastic Search

Posted in Business Intelligence | Leave a Comment »

« Previous Entries

Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Disclaimer:

Social

Calendar

Categories

Blog Directory

Feedback

My other sites

My Stats

Meta

Subscribe

Archive for the ‘Business Intelligence’ Category

Predictive Analytics and AI/ML

Getting Data into Cloud

Time Series

Oracle App Cloud and Incorta

Scalable Distributed BI Architecture

Technologies

Is ETL still necessary?

Is Star Schema necessary?

Incremental ETL – The last refresh date

Use Bit to represent groups

How to – Incremental ETL

Use Surrogate Key in Data Warehouse

Prebuilt BI Contents should replace BI Tools

Users of Analytics Applications

Data Lake and Data Warehouse

RTI and Actionable BI

Analytics apps or embedded analytics

Dashboard vs Report

Schema-less or Schema On Demand

Uncommonly Common

Significant Terms Aggregation – Elastic Search

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Disclaimer:

Social

Calendar

Categories

Blog Directory

Feedback

My other sites

My Stats

Meta

Subscribe

Archive for the ‘Business Intelligence’ Category

Share this:

Share this:

Share this:

Share this:

Share this:

Technologies

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: