Exploratory Analysis of Geochemical Anomalies 53
(EDA), which was then considered as an unconventional and informal approach to
analyse and interpret univariate data that do not follow a normal distribution model.
Since the early 1980s, the EDA approach has gained attention in analysis and modeling
of uni-element geochemical anomalies (e.g., Campbell, 1982; Smith et al., 1982;
Howarth, 1983a, 1984; Garrett, 1988; Kürzl, 1988; Rock, 1988b; Chork and
Mazzucchelli, 1989; Cook and Fletcher, 1993; Yusta et al., 1998; Bounessah and Atkin,
2003; Reimann et al., 2005; Reimann and Garrett, 2005; Grunsky, 2006). This chapter
(a) reviews the concept and methods of EDA that are relevant in modeling of uni-
element geochemical anomalies and (b) demonstrates a GIS-based case study application
of EDA in modeling of significant geochemical anomalies.
EXPLORATORY DATA ANALYSIS
EDA is not a method but a philosophy of or an approach to robust data analysis
(Tukey, 1977). It consists of a collection of descriptive statistical and, mostly, graphical
tools intended to (a) gain maximum insight into a data set, (b) discover data structure, (c)
define significant variables in the data, (d) determine outliers and anomalies, (e) suggest
and test hypotheses, (f) develop prudent models, and (g) identify best possible treatment
and interpretation of data. Whereas the sequence of classical statistical data analysis is
problem→data→model→analysis→conclusions and the sequence of probabilistic data
analysis is problem→data→model→prior data distribution analysis→conclusions, the
sequence of EDA is problem→data→analysis→model→conclusions. Thus, classical
statistical data analysis and probabilistic data analysis are confirmatory approaches to
data analysis (being based on prior assumptions of data distribution models), whilst
EDA, as its name indicates, is an exploratory approach to data analysis.
The goal of EDA is to recognise ‘potentially explicable’ data patterns (Good, 1983)
through application of resistant and robust descriptive statistical and graphical tools that
are qualitatively distinct from the classical statistical tools. From a statistical point of
view, a statistic is resistant and robust (Huber, 1981; Hampel et al., 1986) (a) if it is only
slightly affected either by a small number of gross errors or by a high number of small
errors (resistance) and (b) if it is only slightly affected by data outliers (robustness). The
descriptive statistical and graphical tools employed in EDA are based on the data itself
but not on a data distribution model (e.g., normal distribution), yet they provide resistant
definitions of univariate data statistics and outliers.
Graphical tools in EDA
The emphasis in EDA is interaction between human cognition and computation in
the form of statistical graphics that allow a user to perceive the behaviour and structure
of the data. Among the several types of EDA graphical tools (Tukey, 1977; Velleman
and Hoaglin, 1981; Chambers et al., 1983), the density trace, jittered one-dimensional
scatterplot and boxplot are most commonly used in uni-element geochemical data
analysis (Howarth and Turner, 1987; Kürzl, 1988; Reimann et al., 2005; Grunsky, 2006).
These three EDA graphics, which can be readily stacked on one another (Fig. 3-3), are