
REGRESSION TO THE MEAN 191
Box 7.4 A tale about regression to the mean
Daniel Kahnemann is a psychologist who in 2002 received the Nobel prize in Economics
(actually, the Bank of Sweden Prize in honor of Alfred Nobel) for his work in behavioral
economics. In his speech of thanks he spoke about the following experience.
‘I had the most satisfying Eureka experience of my career while attempting to
teach flight instructors that praise is more effective than punishment for promoting
skill-learning. When I had finished my enthusiastic speech, one of the most seasoned
instructors in the audience raised his hand and made his own short speech, which began
by conceding that positive reinforcement might be good for the birds, but went on to
deny that it was optimal for flight cadets. He said,
“On many occasions I have praised flight cadets for clean execution of some
aerobatic maneuver, and in general when they try it again, they do worse. On
the other hand, I have often screamed at cadets for bad execution, and in general
they do better the next time. So please don’t tell us that reinforcement works and
punishment does not, because the opposite is the case.”
This was a joyous moment, in which I understood an important truth about the world:
because we tend to reward others when they do well and punish them when they do
badly, and because there is regression to the mean, it is part of the human condition that
we are statistically punished for rewarding others and rewarded for punishing them. I
immediately arranged a demonstration in which each participant tossed two coins at
a target behind his back, without any feedback. We measured the distances from the
target and could see that those who had done best the first time had mostly deteriorated
on their second try, and vice versa. But I knew that this demonstration would not undo
the effects of lifelong exposure to a perverse contingency.’
and, since −1 <ρ<1, this means that the expected difference from the mean for the son
(i.e., E(Y |X = x) −m) is smaller (closer to zero) than that for the mid-parent (i.e., x − m).
This is what Galton’s summary amounts to. The immediate consequence is that if we select
only parents with an above average mid-parent height, and look at their children, we will find
that these are on average shorter than their parents. Still tall, but shorter. It works the other
way around also, in that if we look at tall children, their parents are expected to be, on average,
shorter than them, though still tall:
E(X|Y = y) −m = ρ(y − m).
Actually the son/father example is a better illustration of the special case than the mid-
parent/son example of Galton. This is because in Galton’s case the variances for X and Y are
not equal, and we therefore need to use the more general equation (7.5). Instead the mid-parent
height is an average of two heights, so we are probably closer to having σ
1
= σ
2
/
√
2 than
to having σ
1
= σ
2
. This means that the regression curve in this case is E(Y |X = x) −m =
√
2ρ(x − m), and it is only when ρ<1/
√
2 = 0.71 that we have regression to the mean (the
estimate for ρ in Galton’s data was 0.497).
Regression to the mean explains why we should avoid drawing conclusions about an
intervention from a change that has been observed over time. To expand on this, note that we