22 CHAPTER 2. MIXTURE MODELS
e
c(x·v)
is logconcave. Also, the uniform distribution in a convex body is logcon-
cave. The following concentration inequality [LV07] holds for any logconcave
density.
Lemma 2.7. Let X be a random point from a logconcave density in R
n
with
µ = E (X) and R
2
= E (kX − µk
2
). Then,
Pr(kX −µk
2
≥ tR) ≤ e
−t+1
.
Putting this all together, we conclude that Algorithm Classify-Mixture, which
projects samples to the SVD subspace and then clusters, works well for mixtures
of well-separated distributions with logconcave densities, where the separation
required between every pair of means is proportional to the largest standard
deviation.
Theorem 2.8. Algorithm Classify-Mixture correctly classifies a sample of m
points from a mixture of k arbitrary logconcave densities F
1
, . . . , F
k
, with prob-
ability at least 1 − δ, provided for each pair i, j we have
kµ
i
− µ
j
k ≥ Ck
c
log(m/δ) max{σ
i
, σ
j
},
µ
i
is the mean of component F
i
, σ
2
i
is its largest variance and c, C are fixed
constants.
This is essentially the best possible guarantee for the algorithm. However,
it is a bit unsatisfactory since an affine transformation, which does not affect
probabilistic separation, could easily turn a well-separated mixture into one that
is not well-separated.
2.7 An affine-invariant algorithm
The algorithm described here is an application of isotropic PCA, an algorithm
discussed in Chapter 8. Unlike the methods we have seen so far, the algorithm is
affine-invariant. For k = 2 components it has nearly the best possible guarantees
for clustering Gaussian mixtures. For k > 2, it requires that there be a (k −1)-
dimensional subspace where the overlap of the components is small in every
direction. This condition can be stated in terms of the Fisher discriminant, a
quantity commonly used in the field of Pattern Recognition with labeled data.
The affine invariance makes it possible to unravel a much larger set of Gaussian
mixtures than had been possible previously. Here we only describe the case of
two components in detail, which contains the key ideas.
The first step of the algorithm is to place the mixture in isotropic position via
an affine transformation. This has the effect of making the (k − 1)-dimensional
Fisher subspace, i.e., the one that minimizes the Fisher discriminant (the frac-
tion of the variance of the mixture taken up the intra-component term; see
Section 2.7.2 for a formal definition), the same as the subspace spanned by the
means of the components (they only coincide in general in isotropic position),