postal lookup table
- Calculating a customer's age based on his or her date of birth and the
current year
Aggregation: All the data about a business object is brought together. For
example, data elements for a customer may be aggregated from multiple
source files and source databases, such as a Customer Master file, a Prospect
file, a Sales file, and demographic data purchased from a vendor. (In
multidimensional database design jargon, the term aggregation also refers to
the roll-up of data values.)
Integration: Data integration based on normalization rules forces the need to
reconcile different data names and different data values for the same data
element. The desired result is to have each unique data element known by one
standard name, with one standard definition and an approved domain. Each
data element should also be associated with its sources files and source
databases as well as its BI target databases. Standardizing the data should be
a business objective.
Underestimating Data Transformation Efforts
Source data transformation is similar to opening a Russian doll—you open one and
there is another inside. It could be an endless process. That is why the time required
for the ETL process is chronically underestimated. The original estimates are usually
based on the amount of technical data conversions required to transform data types
and lengths, and they often do not take into account the overwhelming amount of
transformations required to enforce business data domain rules and business data
integrity rules.
The transformation specifications given to the ETL developer should never be limited
to just technical data conversion rules. For some large organizations with many old
file structures, the ratio of a particular data transformation effort could be as high as
80 percent effort toward enforcing business data domain rules and business data
integrity rules and only 20 percent effort toward enforcing technical data conversion
rules. Therefore, expect to multiply your original estimates for your ETL data
transformation effort by four. Even if you think you have a very realistic timetable for
the ETL process, do not be surprised if you still miss deadlines due to dirty data. If
you do not miss deadlines, do not be surprised to discover you have not cleansed
enough of the data sufficiently.
Insist on full-time involvement from the business representative, and insist on
getting the right business representative—someone who is knowledgeable about the
business and who has authority to make decisions about the business rules. These
stipulations are essential for speeding up the ETL process. Furthermore, urge the
business sponsor and the business representative to launch a data quality initiative
in the organization, or at least in the departments under their control or influence.
When business people drive a data quality initiative, they are more likely to assist
with the ETL transformation process. Remind them that while IT technicians may
know the process semantics, the business people know the data contents and
business semantics. They understand what the data really means.