
you are here 4 97
measuring variability and spread
Q:
I get why mean, median, and mode are useful, but why
do I need to know how the data is spread out?
A: Averages offer you only a one-dimensional view of your
data. They tell you what the center of your data is, but that’s it.
While this can be useful, it’s often not enough. You need some
other way of summarizing your data in addition to the average.
Q:
So is the median the same as the interquartile range?
A:
No. The median is the middle value of the data, and the
interquartile range is the range of the middle 50% of the values.
Q:
What’s the point of all this quartiles stuff? It seems like
a really tedious way to calculate ranges.
A: The problem with using the range to measure how your
data is dispersed is that it’s very sensitive to outliers. It gives you
the difference between the lower and upper bounds of your data,
but just one outlier can make a huge difference to the result.
We can get around this by focusing only on the central 50% of the
data, as this excludes outliers. This means finding quartiles, and
using the interquartile range. So even though finding quartiles
is trickier than finding the lower and upper bounds, there are
definite advantages.
Q:
Should I always use the interquartile range to measure
the spread of data?
A: In a lot of cases, the interquartile range is more meaningful
than the range, but it all depends on what information you
really need. There are other ways of measuring how values are
dispersed that you might want to consider too; we’ll come to
these later.
Q:
Would I ever want to look at just one quartile of my
data instead of the range or the interquartile range?
A: It’s possible. For example, you might be interested in what
the high values look like, so you’d just look at what values are in
the upper quarter of your data set, using the upper quartile as a
cut-off point.
Q:
Would I ever want to break my data into smaller pieces
than quarters? How about breaking my data into, say, 10
pieces instead of 4?
A: Yes, there are times when you might want to do this. Turn
the page, and we’ll show you more...
The upper and lower bounds of the data are
the highest and lowest values in the data set.
The range is a simple way of measuring how
values are dispersed. It’s given by:
range = upper bound - lower bound
The range is very sensitive to outliers.
The interquartile range is less sensitive to
outliers than the range.
Quartiles are values that split your data
into quarters. The highest quartile is called
the upper quartile, and the lowest quartile is
called the lower quartile. The middle quartile
is the median.
The interquartile range is the range of
the central 50% of the data. It’s given by
calculating
upper quartile - lower quartile