
350
Part IV: Working with Probability
If the distribution you generate matches up well with the real data, does this
mean your model is “right”? Does it mean the process you guessed is the pro-
cess that produces the data?
Unfortunately, no. The logic doesn’t work that way. You can show that a
model is wrong, but you can’t prove that it’s right.
Plunging into the Poisson distribution
In this section, I go through an example of modeling with the Poisson distri-
bution. I introduced this distribution in Chapter 17, and I told you it seems to
characterize an array of processes in the real world. By characterize a pro-
cess, I mean that a distribution of real-world data looks a lot like a Poisson
distribution. When this happens, it’s possible that the kind of process that
produces a Poisson distribution is also responsible for producing the data.
What is that process? Start with a random variable x that tracks the number
of occurrences of a specific event in an interval. In Chapter 17, the “interval”
was a sample of 1,000 universal joints, and the specific event was “defective
joint.” Poisson distributions are also appropriate for events occurring in
intervals of time, and the event can be something like “arrival at a toll booth.”
Next, I outline the conditions for a Poisson process, and use both defective
joints and toll booth arrivals to illustrate:
✓ The numbers of occurrences of the event in two nonoverlapping inter-
vals are independent.
The number of defective joints in one sample is independent of the number
of defective joints in another. The number of arrivals at a toll booth during
one hour is independent of the number of arrivals during another.
✓ The probability of an occurrence of the event is proportional to the size
of the interval.
The chance that you’ll find a defective joint is larger in a sample of
10,000 than it is in a sample of 1,000. The chance of an arrival at a toll
booth is greater for one hour than it is for a half hour.
✓ The probability of more than one occurrence of the event in a small
interval is 0 or close to 0.
In a sample of 1,000 universal joints, you have an extremely low prob-
ability of finding two defective ones right next to one another. At any
time, two vehicles don’t arrive at a toll booth simultaneously.
As I show you in Chapter 17, the formula for the Poisson distribution is
25 454060-ch18.indd 35025 454060-ch18.indd 350 4/21/09 7:37:14 PM4/21/09 7:37:14 PM