294 Chapter 8
predictor variables used to generate the discriminant function. The values of S
DL
can then
be used for mapping geo-objects (e.g., prospective areas) of interest.
In any method of DA, there are five basic assumptions about the data of the predictor
variables. Firstly, the total number of cases must be at least five times the number of
predictor variables (Tabachnick and Fidell, 2007). The number of cases (locations) for
each group
D can be equal or unequal, but if they are unequal the number of cases in the
smallest (or smaller) group must be greater than the number of predictor variables.
Secondly, the data of the predictor variables for the cases of each group represent
samples from a multivariate normal distribution. This assumption is difficult to justify in
mineral prospectivity mapping especially because the ‘deposit-type’ cases and, thus, the
data of the predictor variables for most of these cases are likely not representative of
samples derived from a multivariate normal distribution (see Fig. 8-6). Fortunately, DA
is not seriously affected by violations of the normality assumption as long as non-
normality is not due to outliers (Davis, 2002; Tabachnick and Fidell, 2007). Thirdly, the
variance-covariance matrices of the groups should be equal, although inequality of
variances is, like violation of normality, not ‘fatal’ to DA (Davis, 2002; Tabachnick and
Fidell, 2007). Fourthly, the predictor variables are neither completely redundant nor
conditionally dependent (i.e., highly correlated) because, if they are, the matrix is said to
be ill-conditioned and thus cannot be inverted. Like the normality assumption, the
assumption of conditional independence is difficult to justify for data of the predictor
variables at deposit-type locations. Finally, none of the cases used to derive the
discriminant function are misclassified (i.e., none of the cases from one group belongs to
another group).
In the case study (see below), all the aforementioned five basic assumptions of DA,
except the third basic assumption, are addressed as follows. With respect to the first
basic assumption of DA, the deposit-type locations, which are very few compared to the
number of predictor variables (see below), are not used for training but for testing.
Instead, proxy deposit-type locations are used for training. Two training sets, each
consisting of equal numbers of proxy deposit-type locations and non-deposit locations,
are used in LDA. With respect to the second basic assumption of DA, a one training set
consisting of coherent proxy deposit-type locations (Fig. 8-8) is used in order to address
the problem of non-normality due to outliers. In order to illustrate the utility of coherent
proxy deposit-type locations in data-driven modeling of mineral prospectivity, another
training set consisting of randomly-selected proxy deposit-type locations is used. With
respect to the fourth basic assumption of DA, it is considered that the predictor variables
at the coherent proxy deposit-type locations are not completely redundant because they
are not completely coherent (see Fig. 8-7). With respect to the fifth basic assumption of
DA, non-deposit locations that are highly dissimilar from the coherent proxy deposit-
type locations (see Fig. 8-7) are used in the two training sets described above.
There are two statistical tests of significance in DA (Tabachnick and Fidell, 2007).
First, an F-test (Wilks’ lambda) is applied to test the null hypothesis that two groups
under examination have identical multivariate means (i.e., if the discriminant model as a