Dylan's BI Study Notes

My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management

Incremental ETL – The last refresh date

Posted by Dylan Wan on October 11, 2017

There are multiple ways to model the last refresh date.

In OBIA, DAC and Informatica based ETL, the last refresh date is maintained within DAC. It is maintained at the level of the source tables that populates the data.

Oracle BI DAC User Guide > About Refresh Dates and DAC’s Incremental Load Strategy

In OBIA ODI based ETL, the “Last Extract Date” is maintained in an ETL_LOAD_DATE table at the interface package level.

Oracle BI Applications ETL Guide > ETL Overview > About Changed Data Capture

There are other simpler ways:

Actually, if all data can be refreshed at the same time, the date can be maintained at the entire ETL run cycle level.

SELECT start_time 
INTO last_refresh_date 
FROM ETL_run_history 
WHERE etl_run_id IN (
SELECT MAX(etl_run_id) 
FROM ETL_master 
WHERE is_success = 'Y'
);

Just need to insert a record in the beginning of the ETL load:

INSERT INTO ETL_run_history(
  is_success
, is_completed
, etl_status
, start_time)
VALUE('N', 'Not Yet', "Running...", @Now);

Start calling the store procedures.  Each stored procedure will task the last_refresh_date as an input and evaluate the last update date from the source tables for getting the delta.

After all loads finish successfully, remember to set the status.

SET @max_run_id= (SELECT MAX(etl_run_id) FROM ETL_run_history) ;

UPDATE ETL_run_history 
SET is_success='Y'
,  etl_status="Success"
,  end_time=NOW()
,  err_code=@Returncode 
WHERE etl_run_id = @max_run_id;

After all tasks are finished, remember resetting the ETL run.

UPDATE ETL_run_history SET is_completed="Completed";

The downside of the approach is about restartability.  If a task failed, the entire ETL run will have to be restarted as it always starts by checking the last success run.

All the tasks (stored procedures) that are executed will have to be rerunnable.  To avoid rerunning all tasks, the last refresh date needs to be maintained at a lower level, like Oracle BI Applications does.

Using the stored last refresh date is not always necessary.  I have seen the system that gets the date using the max date from the stored data.

It is not a right or wrong answer, querying the data can get the most update to date data, but the data need to be reliable for tasking this approach.

 

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s