Caers J. Modeling Uncertainty in the Earth Sciences

Подождите немного. Документ загружается.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

2.5 RANDOM VARIABLES 21

2.5 Random Variables

At this point, we have studied two separate issues: (1) how to make numerical summaries

of data, and (2) the study of “probability” in general without considering any data. In

this section we will establish a link between the two – to try to quantify probabilities

or other interesting properties from the data set. A key link will be the concept of the

random variable. A random variable is a variable whose value is a numerical outcome

of a random experiment. A random variable is not a numerical value itself. It can take

various outcomes/values, but we do not know, in advance, exactly which value it will take.

Examples are rolling a dice, drawing a card from a deck, sampling a diamond stone from

a diamond deposit. All of these are variables that can be described by a random variable.

We will use as a capital letter such as X or Y to denote a random variable. The capital

letter is important, because we use it to indicate that the value is unknown. The outcome

of a random variable is then denoted by a small letter such as x or y.

P(X ≤ x): denotes the probability that the random variable X is smaller than a given

outcome x. Recall that “X ≤ x” is termed an event.

2.5.1 Discrete Random Variables

A random variable that can take only a limited set of outcomes or values is termed a dis-

crete random variable. An example is rolling a dice: there are only six possible outcomes.

The frequency at which the outcomes occur or the way the random variable is distributed

can be described by a probability mass function using the following notation:

(a) = P(X = a)

for dice: P(X = 1) = p

(1) = 1/6; P(X = 2) = p

(2) = 1/6, and so on. Note the notation

(a), which means that we evaluate the probability for random variable X to take the

value a.

2.5.2 Continuous Random Variables

While in the discrete case, the frequency of possible outcomes can be counted, the number

of possible outcomes cannot be counted for a continuous random variable. In fact, there

are an inﬁnite number of possibilities. Because of this, P(diamond size = 1 ct) = 0 be-

cause there are an inﬁnite (at least theoretically) possibilities, hence any number divided

by inﬁnite is zero. There are two ways to describe the possible variations of a continuous

random variable. Both ways are equivalent. (1) probability density function (pdf ) and

(2) cumulative distribution function (cdf).

2.5.2.1 Probability Density Function (pdf)

The probability density function, which we denote as f

(x), is deﬁned as an integral of a

positive function and this integral (surface area) denotes a probability:

P(a ≤ X ≤ b) =



(x)dx

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

22 CH 2 REVIEW ON STATISTICAL ANALYSIS AND PROBABILITY THEORY

This seems a very contorted way to deﬁne a probability, but we need to do it in this

mathematical way for continuous variables because of the 1/inﬁnite reason mentioned

above. The notation f

(x) now also becomes a bit more clear. The function f describing

the probabilistic variation of random variable X is evaluated in the point x.

Some important properties are



+∞

−∞

(x)dx = 1 “some outcome will occur for sure”

(x) ≥ 0 “probabilities cannot be negative”

P(X = x) = 0 as discussed above

Does the function value f

(x) have any meaning? It certainly does not have the meaning

of a probability. It really only has meaning when comparing two outcomes, x

and x

Then the ratio:

)

denotes how many times more (or less) likely the outcome x

will occur compared to x

Note that the term “likely” (also used as likelihood) is not the same as “probability”. For

example if the ratio is four, as the case in Figure 2.5, then x

is four times more likely to

occur than x

. Note that this is not the same as saying “four times more probable.”

2.5.2.2 Cumulative Distribution Function

A completely equivalent way of describing a random variable is a cumulative distribution

(Figure 2.6):

(x) = P(X ≤ x)

(x)

Shaded area

represents a probability

Figure 2.5 Example of a probability density function.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

2.5 RANDOM VARIABLES 23

(x )

(x)

always between 0 and 1 and never decreases

Area A

Figure 2.6 Deﬁnition of a cumulative distribution function.

The relationship between F

(x) and f

(x) is:

(x) =



−∞

(y)dy ⇒ f

(x) =

(x)

2.5.3 Expectation and Variance

2.5.3.1 Expectation

Firstly, consider a discrete random variable X with probability mass function p

(x). X has

K possible outcomes, say x

, x

, ..., x

:P(X = x

) = p

)

:P(X = x

) = p

)

:P(X = x

) = p

), etc.

The expected value, using the notation E[X], is deﬁned as:

E[X] =



k=1

P(X = x

)

For example: in rolling a dice we have:

Possible outcomes : x

= 1, x

= 2, x

= 3, x

= 4, x

= 5, x

= 6

E[X] = 1 ×



+ 2 ×



+ 3 ×



+ 4 ×



+ 5 ×



+ 6 ×



Apparently, the expected value of X need not be a value that X could assume. So E[X]

is not the value that one “expects” X to have, but rather E[X] is the average value of X in

a large number of repetitions of the experiment.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

24 CH 2 REVIEW ON STATISTICAL ANALYSIS AND PROBABILITY THEORY

In a very similar manner as for discrete variables, we can deﬁne the expected value for

continuous variables:

E[X] =



+∞

−∞

(x)dx

Instead of a sum, we now use an integral. If we know f

(x), then we can calculate this

integral and ﬁnd E[X]. For example, if

(x) =

√

2␲␴

exp



−



x − ␮

␴





then with some calculus this will result in:

E[X] =



+∞

−∞

√

2␲␴

exp



−



x − ␮

␴





= ␮.

2.5.3.2 Population Variance

In a sense, the expected value of a random variable is one way to summarize the distri-

bution function of that variable. So how do we summarize the spread of that population?

Equivalently, as we did for the data (where we used the empirical standard deviation), we

also use a measure – the population variance:

Var [ X ] = E



(

X − E

[

]

)





+∞

−∞

(

x − ␮

)

(

)

If we use the same function above, then:

Var

[

]



+∞

−∞

(

x − ␮

)

√

2␲␴

exp



−



x − ␮

␴





dx = ␴

2.5.4 Examples of Distribution Functions

2.5.4.1 The Gaussian (Normal) Random Variable and Distribution

The Gaussian or Normal distribution is a very speciﬁc distribution that has the following

mathematical expression:

(x) =

√

2␲␴

exp



−



x − ␮

␴





with ␮ = population mean or expected value; ␴ = population standard deviation. Fig-

ure 2.7 shows some examples with various values for ␮ and ␴.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

2.5 RANDOM VARIABLES 25

0.4

μ=0,σ=1

μ=0,σ=2

μ=1,σ=1

0.35

0.3

0.25

0.2

0.15

0.1

0.05

–5 0 5

Figure 2.7 Gaussian distribution for various values of μ and σ .

The Gaussian distribution has two parameters ␮ and ␴ that can be freely chosen (re-

membering that ␴ > 0!). The parameter ␮ “regulates” the center of the distribution. It is

the mean if you have an inﬁnite amount of samples from a random variable X with this

distribution. The parameter ␴ “regulates” the width of the bell-shaped curve.

Other properties of this distribution function are:

population mean = population median

population mode = population mean

There is no mathematical expression for F

(x). You need to use a computer or a table

from a book.

2.5.4.2 Bernoulli Random Variable

The simplest case of a discrete random variable is one that has two possible outcomes.

For simplicity, we will call these categories 0/1.

X = 0 If a trial results in “failure”

X = 1 If a trial results in “success”

A trial should be treated in the broadest sense possible.

Examples:

Finding a diamond larger than 2 ct means “success.”

Rolling ones twice in a row means “success.”

Finding a diamond smaller than 4 ct means “success.”

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

26 CH 2 REVIEW ON STATISTICAL ANALYSIS AND PROBABILITY THEORY

In statistics, we call such a random variable a Bernoulli random variable. In geostatis-

tics, we often call it an indicator random variable. The probability distribution is com-

pletely quantiﬁed by knowing the probability p of success:

p = P(X = 1) and P(X = 0) = 1 − p

E[X] = 1 × p + 0 × (1 − p) = p

Var [ X ] = (1 − p)

p + (0 − p)

(1 − p) = p(1 − p)

2.5.4.3 Uniform Random Variable

A random variable X is termed uniform when each of its outcomes is equally likely to

occur between any two values a and b, with a < b (Figure 2.8), the probability den-

sity equals:

(x) =





(b − a)

if a ≤ x ≤ b

0 elsewhere

The uniform random variable is important in the context of generating random num-

bers on a computer.

2.5.4.4 A Poisson Random Variable

Examples for which a Poisson random variable is appropriate:

The number of misprints on a page of a book

The number of people in a community that are 100 years old

The number of transistors that fail in their ﬁrst day of use.

Poisson random variables typically have the following characteristics:

p = probability of the event occurring is small

n = number of trials is large

(x) F

(x)

bab

Figure 2.8 Uniform pdf and cdf.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

2.5 RANDOM VARIABLES 27

Figure 2.9 Random distribution of points over an area.

In the Earth sciences, the Poisson distribution is important because it has a spatial con-

nection. Take an area with certain objects (diamonds, trees, plants, earthquakes). Take a

small box and put it inside this area, as done in Figure 2.9. The random variable describ-

ing the number of points that you will ﬁnd in the box is a Poisson random variable and

follows the following equation:

(i) = P(X = i) = e

−␭

␭

␭ = average number of points in the box

In Chapter 5, we will discuss Boolean or object models, where we will simulate ob-

jects in space. To do so we will make use of the Poisson process, that is, the process of

spreading objects randomly in space as done in Figure 2.9. Note that the coordinate X

and Y of each point are also random variables, namely uniform random variables.

2.5.4.5 The Lognormal Distribution

A variable X is lognormally distributed if and only if log X is normally/Gaussian dis-

tributed. So, if we calculate the log of the data, then the histogram should look like a

normal distribution in case that variable is lognormally distributed. The lognormal dis-

tribution has, therefore, two parameters – mean and variance. The lognormal distribution

can be extremely skewed. Hence, it is an ideal candidate for describing skewed data sets.

The lognormal variable is also positive. This makes it a very useful distribution for most

Earth Science data which are strictly positive; permeability (Darcy), magnitudes, and

grain sizes (mm), for example, are often lognormal.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

28 CH 2 REVIEW ON STATISTICAL ANALYSIS AND PROBABILITY THEORY

2.5.5 The Empirical Distribution Function versus

the Distribution Model

A random variable X describes the entire population of possible outcomes, and its distri-

bution (pdf or cdf) describes in detail which of these outcomes are more likely to occur

than others. F

(x)orf

(x) are also termed the distribution model of the population. Un-

fortunately, we do not know the entire population, and we certainly do not know F

(x)

or f

(x). We only have data, that is, a set of values or outcomes of sampling. Using the

data, we will have to guess what f

(x) and F

(x) are. To do this, we will use the empirical

distribution function, which is essentially the “distribution model of the data” and not the

entire population of X. Just as for the population, we have an empirical pdf and cdf.

Empirical pdf =

(x) = density distribution obtained from the data. The histogram

is, in fact, a graphical representation of the empirical pdf, so we will call

(x) the

histogram.

Empirical cdf =

(x) = the cumulative distribution function based upon the data. It

is constructed as shown in Figure 2.10.

Sort the data and plot them on the x-axis.

A cumulative probability speciﬁes the probability of being below a threshold, so for the

empirical cdf this becomes:

(x) = P(X ≤ observed datum x)

P(X ≤ x

) = 1/6 then 16% of the data is less than or equal to x

= 3.2.

P(X ≤ x

) = 2/6 then 33% of the data is less than or equal to x

= 8.6.

n = 6 data: 10.1 / 15.4 / 8.6 / 9.5 / 20.6 / 3.2

20.63.2

10.18.6

9.5 15.4

1/6

(x)

Figure 2.10 Empirical cdf.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

2.5 RANDOM VARIABLES 29

2.5.6 Constructing a Distribution Function from Data

An important task in statistical analysis is to ﬁgure out which distribution is suitable to

model the data: is it normal? Lognormal? Uniform? Unfortunately, no set of distribution

models exists that is “ﬂexible” enough to ﬁt all the data sets that are observed in nature.

Many of the theoretical distribution models (such the normal and lognormal) stem from

an era where computers were not available and modelers used functions that had only a

few parameters they could easily estimate. In this book, a more computer-oriented method

of interpolation/extrapolation that uses the data themselves to construct a distribution

model is advocated. Our anchor point will be the empirical distribution.

Figure 2.11 provides an example, where we assume the data is bounded between 0 and

100. Recall that in the empirical cumulative distribution, we list the data x

, ...,x

, and

Linear inter/extrapolaon

20.63.2

10.18.6

9.5

15.4

100

20.6

3.2

10.18.6

9.5

15.4

100

1/7

1/(n+1) to allow extrapolaon

6 data samples: 3.2 / 8.6 / 9.5 / 10.1 / 15.4 / 20.6

Figure 2.11 Empirical cdf for building a distribution model directly from data.

P1: OTA/XYZ P2: ABC

JWST061-02 JWST061-Caers March 30, 2011 18:55 Printer Name: Yet to Come

30 CH 2 REVIEW ON STATISTICAL ANALYSIS AND PROBABILITY THEORY

then we make steps of 1/n between them. However, when we want a distribution model

for the entire population, we need to perform two additional steps:

Interpolate: We need to know what happens between x

and x

,orx

and x

and so on.

Extrapolate: We need to know what happens for observations larger than x

and

smaller than x

. Indeed, the data are only a limited sample of the entire population.

In the entire population it can be expected that some values are large than x

and some

are smaller than x

Therefore, we “complete” the empirical distribution by introducing so-called interpo-

lation and extrapolation models, which can be chosen by the modeler, for example a linear

or parabolic/hyperbolic type function. There is no problem to extrapolate lower than x

but what about values higher than x

? There is apparently no room left, because F

(X >

) = 0. The way to solve this problem is to go in steps of 1/(n + 1) instead of 1/n.In

this case, steps of

instead of

are used as shown in Figure 2.11.

In essence, we “patch” together a distribution model F

(x) by piecing together inter-

polation and extrapolation models. The advantage of constructing a distribution function

this way is that all one needs to know is essentially a series of numbers to represent a

distribution function. In the current computer era, we have enough memory space to store

a series of, for example, 100 000 values. With the suitable interpolation and extrapola-

tion models, that series of values and the interpolation/extrapolation models represents a

distribution function.

2.5.7 Monte Carlo Simulation

Monte Carlo is a statistical technique that aims at “mimicking” the process of sampling an

actual phenomenon. Therefore, Monte Carlo simulation is often referred to as “sampling”

or “drawing” from a distribution function. When one is actually sampling (not Monte

Carlo sampling), samples are obtained from the ﬁeld to ﬁgure out what, for example, the

population density distribution f

(x) is. In Monte Carlo simulation, one assumes that the

distribution f

(x) is known, and uses a computer program to sample from it. To construct

a sample experiment, we somehow need to have access to a “random entity,” since we

want our sampling to be fair, that is, no particular value should occur more as described

by the distribution function. For example: how would we simulate ﬂipping a coin on a

computer, such that the outcomes are close to 50/50 head and tail when a large number of

trials are performed? Unfortunately, no random machine exists (a computer is a machine

and still deterministic) that can render a fully random entity. What is available is a so-

called pseudo-random number generator. A pseudo-random number generator is a piece

of software that renders as output a random number upon demand. Such random number

or value, in statistical terms, is simply the outcome of a uniform [0,1] random variable.

Hence, it is a number that is always between zero and one. These numbers are pseudo-

random numbers, because a pseudo random number generator always has to be started

with what is called a “random seed.” For a given random seed, one will always obtain the