Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Data Profiling

Data Profiling is a systematic process of examining the existing data you have.  It is an important step for designing and building ETL and for data quality managment.  It is a good pratice to continuously perform data profiling to your data.

Data Profiling function can be performed at various levels:

Table Level:

  • Row Count
  • Identifying the candidate keys (the unique combinations of a table)
  • Evaluate the redundency

Column Level:

  • Apply the aggregate function (Total, Average, Medium, Minimum, Maximum, Missing Values)
  • Distinct Value Count – Draw Histogram for the data
  • Domain Validation: Test if all the data falling into the pre-determined list of values
  • Domain Inference: Identify the list of possible values

Inter-Column or Intra-Table:

  • Identify the inter-dependency
  • Identify the denormalized columns
  • Identify transitive dependencies

Inter-Table

  • Foreign Key discovery
  • Outer-Join analysis
Advertisements