Data Lake vs. Data Warehouse
Posted by Dylan Wan on October 4, 2015
These are different concepts.
Data Lake – Collect data from various sources in a central place. The data are stored in the original form. Big data technologies are used and thus the typical data storage is Hadoop HDFS.
Data Warehouse – “Traditional” way of collecting data from various sources for reporting. The data are consolidated and are integrated. A data warehouse design that follow the dimensional modeling technique may store data in star schema with fact tables and dimension tables. Typically a relational database is used.
If we look at the Analytics platform at Ebay from this linkedin slideshare and this 2013 article:
EBay takes a coexistince approach of having the Hadoop cluster along side with the traditional EDW. (See page 9)
According to Makoto SHIROTA in his book Impacts of Big Data, Ebay takes this approach as “There is No technology silver bullet”.
Although the materials are kind of old (three year), this may still be a good reference.
I think that Data Lake and Data Warehouse are not exclusive and will co-exist for a while.