Learning ETL – Architecture
Posted by Dylan Wan on January 15, 2007
The central component of any ETL tool is the repository. It typically stores the metadta for all applications, including both source and target systems. It also contains the business logic about mapping and transforming the data from one to the other.
In many cases, the repository is stored in a relational database. You can also see in some cases, the repository is stored in the file system.
Different users may use the repository in different ways and thus require different user interfaces.
- Designers use a tool to reverse engineering metadata, define the business rules for transformation and data quality, and develop the interface programs.
- Operators use a tool for scheduling and operating run-time. The repository may store the status and the resulting messages from the execution programs. It may also store execution logs and statistics, such as the number of records processed, the elapsed time, etc.
- Administrators use a tool for defining the users and their authorities.
These GUI interface can be developed as a client/server tool in pure Java, or can be available as a web applications, or mixed.
At run time, a scheduler agent orchestrates the execution. A process may be launched manually from the user interface or by the scheduler engine. The transformation program may return the status and resulting messages to the execution engine, which then update the logs in the repository so they can reviewed by the operators.