
192 Measurement and Data Analysis for Engineering and Science
Because his employer would not allow him to publish his findings, he pub-
lished them under the pseudonym “Student”[2]. His distribution was named
the Student’s t distribution. “Student” continued to publish significant
works for over 30 years. Mr. Gosset did so well at Guinness that he eventu-
ally was put in charge of its entire Greater London operations [1].
The essence of what Gosset found is illustrated in Figure 6.3. The solid
curve indicates the normal probability density function values for various z.
It also represents the Student’s t probability density function for various t
and a large sample size (N > 100). The dashed curve shows the Student’s
t probability density function values for various t for a sample consisting
of 9 members (ν = 8), and the dotted curve for a sample of 3 members
(ν = 2). It is clear that as the sample size becomes smaller, the normal
probability density function near its mean (where z = 0) overestimates the
sample probabilities and, near its extremes (where z > ∼ 2 and z < ∼ −2),
underestimates the sample probabilities. These differences can be quantified
easily using the expressions for the probability density functions.
The probability density function of Student’s t distribution is
p(t, ν) =
Γ[(ν + 1)/2]
√
πνΓ(ν/2)
1 +
t
2
ν
−(ν+1)/2
, (6.17)
where ν denotes the degrees of freedom and Γ is the gamma function, which
has these properties:
Γ(n) = (n − 1)! for n = whole integer
Γ(m) = (m − 1)(m − 2)...(3/2)(1/2)
√
π for m = half −integer
Γ(1/2) =
√
π
Note in particular that p = p(t, ν) and, consequently, that there are an
infinite number of Student’s t probability density functions, one for each
value of ν. This was suggested already in Figure 6.3 in which there were
different curves for each value of N.
The statistical concept of degrees of freedom was introduced by R.A.
Fisher in 1924 [1]. The number of degrees of freedom, ν, at any stage in
a statistical calculation equals the number of recorded data, N, minus the
number of different, independent restrictions (constraints), c, used for the
required calculations. That is, ν = N − c. For example, when computing
the sample mean, there are no constraints (c = 0). This is because only the
actual sample values are required (hence, no constraints) to determine the
sample mean. So for this case, ν = N. However, when either the sample
standard deviation or the sample variance is computed, the value of the
sample mean value is required (one constraint). Hence, for this case, ν =
N − 1. Because both the sample mean and sample variance are contained
implicitly in t in Equation 6.17, ν = N −1. Usually, whenever a probability
density function expression is used, values of the mean and the variance are
required. Thus, ν = N −1 for these types of statistical calculations.