
130 DATA COMPRESSION
Example 5.9.2 We now give another example for construction of the
Shannon–Fano–Elias code. In this case, since the distribution is not
dyadic, the representation of F(x) in binary may have an infinite number
of bits. We denote 0.01010101 ... by 0.
01. We construct the code in the
following table:
x p(x) F (x) F(x) F(x) in Binary l(x) =
log
1
p(x)
+ 1 Codeword
1 0.25 0.25 0.125 0.001 3 001
2 0.25 0.5 0.375 0.011 3 011
3 0.2 0.7 0.6 0.1
0011 4 1001
4 0.15 0.85 0.775 0.110
0011 4 1100
5 0.15 1.0 0.925 0.111
0110 4 1110
The above code is 1.2 bits longer on the average than the Huffman
code for this source (Example 5.6.1).
The Shannon–Fano–Elias coding procedure can also be applied to
sequences of random variables. The key idea is to use the cumulative
distribution function of the sequence, expressed to the appropriate accu-
racy, as a code for the sequence. Direct application of the method to blocks
of length n would require calculation of the probabilities and cumulative
distribution function for all sequences of length n, a calculation that would
grow exponentially with the block length. But a simple trick ensures that
we can calculate both the probability and the cumulative density func-
tion sequentially as we see each symbol in the block, ensuring that the
calculation grows only linearly with the block length. Direct application
of Shannon–Fano–Elias coding would also need arithmetic whose preci-
sion grows with the block size, which is not practical when we deal with
long blocks. In Chapter 13 we describe arithmetic coding, which is an
extension of the Shannon–Fano–Elias method to sequences of random
variables that encodes using fixed-precision arithmetic with a complexity
that is linear in the length of the sequence. This method is the basis of
many practical compression schemes such as those used in the JPEG and
FAX compression standards.
5.10 COMPETITIVE OPTIMALITY OF THE SHANNON CODE
We have shown that Huffman coding is optimal in that it has minimum
expected length. But what does that say about its performance on any
particular sequence? For example, is it always better than any other code
for all sequences? Obviously not, since there are codes that assign short