P1: OTA/XYZ P2: ABC
JWST061-07 JWST061-Caers April 6, 2011 13:20 Printer Name: Yet to Come
7.2 PROBABILITY-BASED APPROACHES 111
as expressed through these conditional probabilities into a single model probability based
on all sources, namely:
What is P(A|B
1
, B
2
,...)?
One way is to perform a single calibration involving all the data sources at once to get
this combined conditional probability, but this is often too difficult or it would require
a high quality and rich calibration data set. A single calibration with many variables
requires a lot of data in order for this calibration to be accurate. Often such exhaustive
data sets are not available. Also, these partial conditional probabilities may be provided by
experts from very different fields. In climate modeling, for example, one may have very
different data sources, such as tree ring growth, ice cores, pollen and sea floor sediment
cores, to predict climate changes, each requiring a very different field of expertise. It
would be too difficult to put all data in one basket and then hope to directly get a good
prediction of climate change.
In other words, some way to combine these individual conditional probabilities into
a joint conditional probability is needed. A simple and quite general way of doing this
combination of probabilities is provided here, knowing that other methods exists in the
literature, but typically they require making similar assumptions. To understand better the
issues in doing this, consider a very simple binary problem: two sources of information
(events B
1
and B
2
) inform that the chance it will rain tomorrow (event A) is significant,
in fact:
r
From the first source B
1
we have deduced (by calibration for example) that there is a
probability of 0.7 of event A occurring.
r
From the second source B
2
we obtain a probability of 0.6.
r
The “historical” probability of it raining on the date “tomorrow” is 0.25.
r
We know that the two sources B
1
and B
2
do not encapsulate the same data (calibration
data or experts opinion).
The question is simple: what would you give as the probability for it to rain tomorrow?
The answer to this question is not unique and depends how much “overlap” there is in the
information of each source in determining the event A. Clearly, if the two sources (e.g.,
experts, calibration data) use the same data to come to their respective conditional proba-
bilities then there is a conflict. This is often the case in practice, since no two procedures
for modeling conditional probabilities need to yield the exact same result because of vari-
ous modeling and measurement errors. But we will assume here that at least theoretically
we do not have such conflict.
What is also relevant is that the “prior” or “background” conditional probability is
0.25. In fact, what is very relevant is that both sources predict a higher than usual probabil-
ity of it raining. To capture this amount of “overlap” a new term is introduced, namely that
of data redundancy. Redundancy measures how much “overlap” there is in the sources of