
634 Chapter 15
Q:
I’ve seen other ways of calculating
r. Are they wrong?
A: There are several different forms of the
equation for finding r, but underneath, they’re
basically the same. We’ve used the simplest
form of the equation so that it’s easier to
see what you’ve already calculated through
finding b.
Q:
Are the results accurate with such
a small sample?
A: A larger sample would definitely be
better, but we used a small sample just to
make the calculations easier to follow.
Q:
You haven’t proved or derived why
you calculate the values of b and r in this
way. Why not?
A: Deriving the formula for b and r is quite
complex and involved, so we’ve decided not
to go through this in the book. The key thing
is that you understand when and how to use
them.
Q:
What’s the expected concert
attendance if the predicted hours of
sunshine is 0?
A: We can’t say for certain because this
is quite a way outside the range of data we
have. The line of best fit is a pretty good
estimate for the range of data that we have,
but we can’t say with any certainty what
the concert attendance will be like outside
this range. The data might follow a different
pattern outside this range, so any estimate
we gave would be unreliable.
Q:
When we were looking at averages,
we saw that univariate data can have
outliers. What about bivariate data?
A: Yes, bivariate data can have outliers
too. Outliers are points that lie a long way
from your regression line. If you have
outliers, then this can mean that you have
anomalies in your data set, or alternatively,
that your regression line isn’t a good fit of
the data.
Q:
I’ve heard of influential
observations. What are they?
A: Influential observations are points that
lie a long way horizontally from the rest of
the data. Because of this, they have the
effect of pulling the regression line towards
them.
Q:
So is an influential observation the
same as an outlier?
A: No. Outliers lie a long way from the
line. Influential observations lie a long way
horizontally from the data.
Find r for the concert data, continued
r = bs
x
/s
y
= 5.32 x 1.81/10.56
= 0.91 (to 2 decimal places)
As r is very close to 1, this means that there’s strong positive
correlation between open air concert attendance and hours
of predicted sunshine. In other words, based on the data that
we have, we can expect the line of best fit, y = 15.80 + 5.32x,
to give a reasonably good estimate of the expected concert
attendance based on the predicted hours of sunshine.
sunshine (hours)
attendance (100’s)
0 1 2 3 4 5 6 7 8
0
10
20
30
40
50
60
x
y
r = 0.91
calculating r and no dumb questions
Now that we’ve found that b = 5.32, s
x
= 1.81, and s
y
= 10.56,
we can put them together to find r.