Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Getting Data into Cloud

Posted by Dylan Wan on August 27, 2021

When I worked on the data warehousing technologies, we extract the data from the source. The “Extract” is the first step in ETL (or ELT). The extraction was typically done by using SQL connection to the database that holds the transactional data.

When we start introducing cloud based storage, or the Data Lake, many of the process is done via “Data Ingestion”.

The real difference between “Data Extraction” and “Data Ingestion” is that when we extract the data using SQL, we typically know what the data look like. The structure of the data is known and thus we can design the target placeholder, which was called Staging tables, Tables in the data warehouse, or the operational data store (ODS) before we write the code to extract the data. When we perform data ingestion, we should not need to know what the source structure looks like and we definitely do not have to design the target for holding the data until later when we need to use the data. The design and development phase can be shorten or deferred.

When a cloud service cannot handle supporting the schema inference and support getting insights from the data with unknown structure or the structure that may change over time, the intelligence will have to put into the extraction or ingestion layer.

The intelligent ingestion means that even though the source structure is complex, the complex data model has been already handled and the structured data are delivered to the destination.


How is this different from the prebuilt BI Apps?

Prebuilt Apps provides the end to end, not just the extraction or transformation, or the target schema design, but prebuilt libraries of metrics, analytics workflow, prebuilt dashboards, etc.

By providing the intelligent ingestion, the data is there and you can still building the analytics application yoursleves.

Leave a comment