
500 KOLMOGOROV COMPLEXITY
FIGURE 14.6. Mona Lisa.
the circle and its average gray level and then to describe the index of
the circle among all the circles with the same gray level. Assuming
an n-pixel image (of size
√
n by
√
n), there are about n + 1 possible
gray levels, and there are about (
√
n)
3
distinguishable circles. Hence,
k
∗
≈
5
2
log n in this case.
14.13 MINIMUM DESCRIPTION LENGTH PRINCIPLE
A natural extension of Occam’s razor occurs when we need to describe
data drawn from an unknown distribution. Let X
1
,X
2
,...,X
n
be drawn
i.i.d. according to probability mass function p(x). We assume that we
do not know p(x), but know that p(x) ∈
P, a class of probability mass
functions. Given the data, we can estimate the probability mass function in
P that best fits the data. For simple classes P (e.g., if P has only finitely
many distributions), the problem is straightforward, and the maximum
likelihood procedure [i.e., find ˆp ∈
P that maximizes ˆp(X
1
,X
2
,...,X
n
)]
works well. However, if the class
P is rich enough, there is a problem
of overfitting the data. For example, if X
1
,X
2
,...,X
n
are continuous
random variables, and if
P is the set of all probability distributions, the
maximum likelihood estimator given X
1
,X
2
,...,X
n
is a distribution that
places a single mass point of weight
1
n
at each observed value. Clearly, this
estimate is too closely tied to actual observed data and does not capture
any of the structure of the underlying distribution.
To get around this problem, various methods have been applied. In the
simplest case, the data are assumed to come from some parametric distri-
bution (e.g., the normal distribution), and the parameters of the distribution
are estimated from the data. To validate this method, the data should be
tested to check whether the distribution “looks” normal, and if the data
pass the test, we could use this description of the data. A more general
procedure is to take the maximum likelihood estimate and smooth it out
to obtain a smooth density. With enough data, and appropriate smoothness