The future of data warehousing
Posted by Dylan Wan on March 17, 2016
Data warehousing is really about preparation of the data for reporting. The assumption are:
- You can predicate what typical queries look like to some extent.
- The data need to be prepared to make the query easier or faster, or make more sense from the data .
- You know where the data come from and you can Extract from the source
- You know what the target look like so you can Transform the data
- You Load the data somewhere so you do not need to query the source directly.
The future of data warehousing is related to whether the above assumptions are still true. Other factors are relating to technologies and the source data available.
Based on the above assumptions, here are my thoughts:
Many business questions are the same as before, and more questions raise, the job of preparing the data for reporting should still exist. The domain knowledge is much more important than the technical knowledge. Understanding what business questions to ask, and be able to answer them with the knowledge about where the source data are would be a key.
The technologies will change the way how we prepare the data. In the past, relational database, optimization done to generate and execute the queries, indexing, such as bitmap indexes, were important. They may not be the platform used in the new world. The memories and storage devices become much cheaper. The process in the cluster become possible. The tools built to leverage the advancements in the hardware and processing algorithms will win or will survive.
The new types of data sources will give opportunities in a short term for those who know how to handle them. Image, Video, geographical data, logging data from various devices, etc. may change the way how the data are extracted and processed. How fast the system can adopt the changes will also be essential.
We cannot always know exactly what people are looking for. We need to give the tools to those who may need to transform the data. Also, making the data transformation become a product may be possible if we can create a marketplace for them to enable people to share.
The separation of the source and target is still useful. We should think of these as data products.