Query and reporting tools give business analysts the ability to investigate
company performance and customer behavior.
Statistical tools enable statisticians to perform sophisticated studies of the
behavior of a business.
New multidimensional online analytical processing (OLAP) tools deliver the
ability to perform "what if" analysis and to look at a large number of
interdependent factors involved in a business problem.
Many of these tools work with BI applications and can sift through vast amounts of
data. Given this abundance of tools, what is so different about discovery-driven data
mining? The big difference is that traditional analysis techniques, even sophisticated
ones, rely on the analyst to know what to look for in the data. The analyst creates
and runs queries based on some hypotheses and guesses about possible
relationships, trends, and correlations thought to be present in the data. Similarly,
the executive relies on the business views built into the EIS tool, which can examine
only the factors the tool is programmed to review. As problems become more
complex and involve more variables to analyze, these traditional analysis techniques
can fall short. In contrast, discovery-driven data mining supports very subtle and
complex investigations.
Data Sources for Data Mining
BI target databases are popular sources for data mining applications. They contain a
wealth of internal data that was gathered and consolidated across business
boundaries, validated, and cleansed in the extract/transform/load (ETL) process. BI
target databases may also contain valuable external data, such as regulations,
demographics, or geographic information. Combining external data with internal
organizational data offers a splendid foundation for data mining.
The drawback of multidimensional BI target databases is that since the data has
been summarized, hidden data patterns, data relationships, and data associations
are often no longer discernable from that data pool. For example, the data mining
tool may not be able to perform the common data mining task of market basket
analysis (also called associations discovery, described in the next section) based on
summarized sales data because some detailed data pattern about each sale may
have gotten lost in the summarization. Therefore, operational files and databases are
also popular sources for data mining applications, especially because they contain
transaction-level detailed data with a myriad of hidden data patterns, data
relationships, and data associations.