P1: OTA/XYZ P2: ABC
JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come
30 CH 2 REVIEW ON STATISTICAL ANALYSIS AND PROBABILITY THEORY
then we make steps of 1/n between them. However, when we want a distribution model
for the entire population, we need to perform two additional steps:
r
Interpolate: We need to know what happens between x
2
and x
3
,orx
3
and x
4
and so on.
r
Extrapolate: We need to know what happens for observations larger than x
6
and
smaller than x
1
. Indeed, the data are only a limited sample of the entire population.
In the entire population it can be expected that some values are large than x
6
and some
are smaller than x
1
.
Therefore, we “complete” the empirical distribution by introducing so-called interpo-
lation and extrapolation models, which can be chosen by the modeler, for example a linear
or parabolic/hyperbolic type function. There is no problem to extrapolate lower than x
1
,
but what about values higher than x
6
? There is apparently no room left, because F
X
(X >
x
6
) = 0. The way to solve this problem is to go in steps of 1/(n + 1) instead of 1/n.In
this case, steps of
1
/
7
instead of
1
/
6
are used as shown in Figure 2.11.
In essence, we “patch” together a distribution model F
X
(x) by piecing together inter-
polation and extrapolation models. The advantage of constructing a distribution function
this way is that all one needs to know is essentially a series of numbers to represent a
distribution function. In the current computer era, we have enough memory space to store
a series of, for example, 100 000 values. With the suitable interpolation and extrapola-
tion models, that series of values and the interpolation/extrapolation models represents a
distribution function.
2.5.7 Monte Carlo Simulation
Monte Carlo is a statistical technique that aims at “mimicking” the process of sampling an
actual phenomenon. Therefore, Monte Carlo simulation is often referred to as “sampling”
or “drawing” from a distribution function. When one is actually sampling (not Monte
Carlo sampling), samples are obtained from the field to figure out what, for example, the
population density distribution f
X
(x) is. In Monte Carlo simulation, one assumes that the
distribution f
X
(x) is known, and uses a computer program to sample from it. To construct
a sample experiment, we somehow need to have access to a “random entity,” since we
want our sampling to be fair, that is, no particular value should occur more as described
by the distribution function. For example: how would we simulate flipping a coin on a
computer, such that the outcomes are close to 50/50 head and tail when a large number of
trials are performed? Unfortunately, no random machine exists (a computer is a machine
and still deterministic) that can render a fully random entity. What is available is a so-
called pseudo-random number generator. A pseudo-random number generator is a piece
of software that renders as output a random number upon demand. Such random number
or value, in statistical terms, is simply the outcome of a uniform [0,1] random variable.
Hence, it is a number that is always between zero and one. These numbers are pseudo-
random numbers, because a pseudo random number generator always has to be started
with what is called a “random seed.” For a given random seed, one will always obtain the