ETL Design Activities
The activities for ETL design do not need to be performed linearly. Figure 9.7
indicates which activities can be performed concurrently. The list below briefly
describes the activities associated with Step 9, ETL Design.
Create the source-to-target mapping document.
Use the source data analysis results and the business rules from the previous
steps and incorporate them into the transformation specifications. Document
the transformation specifications in a source-to-target mapping matrix or
spreadsheet.
1.
Test the ETL tool functions.
It is very important to test the ETL tool functions before designing the ETL
process flow and before deciding how to set up the staging area. For example,
it would be worthless to install a currently popular ETL tool that cannot read flat
files on a mainframe if 90 percent of your source data is in flat files on your
mainframe. Therefore, test the ETL tool functions and determine whether
supplemental code must be written to perform some complicated and lengthy
transformations that the tool cannot handle.
2.
Design the ETL process flow.
The most challenging aspect of ETL design is creating an efficient ETL process
flow. Because most data staging windows are very small—only a few hours per
night—the ETL process must be streamlined as much as possible. That means
breaking the ETL process into small program components so that as many as
possible can be run in parallel.
3.
Design the ETL programs.
Since most organizations require several years of historical data to be loaded
with the first BI application release, there are three sets of ETL programs to
consider: the initial load, the historical load, and the incremental load. The
incremental load will probably be a delta load and will therefore be the most
complicated to design. Modularize the ETL programs as much as possible, and
create programming specifications for each ETL program module.
4.
Set up the ETL staging area.
Determine whether you need a centralized staging area on a dedicated server
or whether it would make more sense to implement a decentralized staging
area in your environment. Deciding factors are the type and location of source
files and source databases, as well as the functions, capabilities, and licensing
terms of the ETL tool.
5.
Figure 9.7. ETL Design Activities