
25-4
REFERENCE DATA FOR ENGINEERS
of the received word do not always match those of the
channel codeword.
The function of the channel decoder is to use the
redundancy in a channel codeword
to
correct the errors
in the received word, and then
to
produce an estimate of
the source codeword from it. If all errors are corrected,
the estimated source codeword matches the original
source codeword. The source decoder performs the
inverse operation of the source encoder and delivers its
output
to
the user. The source encoder and decoder are
studied under the terms “data compaction” and “data
compression.
”
The channel encoder and decoder are
commonly split into two functions: implementing error
control to negate the effects of channel noise, and
preparing the sequence of transmitted symbols to be
compatible with channel constraints. These are studied
under the terms “error control codes” and “con-
strained channel codes.” The modulator and the de-
modulator are studied under the term ‘‘modulation
theory.
”
Although we use the terminology of communication
theory, the model is general and applies to a great
variety of situations. One can interpret many other
information-handling systems, such as mass storage
systems, in terms of this model. It only is necessary to
identify the boundaries between the boxes. This is
arbitrary and depends on the goals of a particular
analysis. Usually, the source, channel, and user are
identified with those parts of the system that are fixed,
and the encoder/decoder and modulator/demodulator
are identified with those parts of the system that are
subject to design. Therefore, in different circumstances,
the identification of these functions may be different.
The operation of the encoders and decoders is to map
strings of symbols from one alphabet into strings of
symbols from a second alphabet. The two alphabets are
often the same, but they need not be. A
block code
breaks the input data stream into blocks of fixed length
k
and encodes each block into a codeword of fixed
length
n;
these are concatenated to form the output data
stream. A
variable-to-fuced-length
block code
breaks the
input data stream into blocks of variable length and
encodes the blocks into codewords of fixed length
n
that
are concatenated
to
form the output data stream. A
fuced-to-variable-length
block code
breaks the input data
stream into blocks of fixed length
k
and encodes these
into codewords of variable length that are concatenated
to form the output data stream. A
tree code
breaks the
input data stream into frames of length
k,
that are
encoded into codeword frames of length
no
with the
encoding map depending
on
the previous
rn
input data
frames. The codeword frames are concatenated to form
the output data stream. A tree code is called a variable-
to-fixed-length tree code or a fixed-to-variable-length
tree code when the input or output frames, respectively,
are of variable length. A tree code with a finite
encoding memory
of
rn
frames is called a
sliding block
code
if the encoding operation is time invariant, and it
is called a
convolutional code
if the encoding operation
is
both linear and time invariant.
Introductory information-theory textbooks intended
for engineers are listed as references
1
through
4
at the
end
of
this chapter. Other books and papers devoted to
special topics will be cited
in
the appropriate section
only if the topic is not treated within the general
textbooks of the field.
CODING FOR NOISELESS
CHANNELS
A discrete channel is a system by means of which an
arbitrarily long sequence of symbols, each chosen from
a finite set of
I
symbols
{ao,
.
.
.
,
a!-,},
can be
transmitted from one point to another. The transmission
of symbol
ai
requires a certain time duration,
ti
seconds, which is not necessarily the same for all
i.
A
noiseless channel is one in which the output is com-
pletely determined by the input-errors do not occur. It
is not always true that all possible sequences of symbols
from the set
{ai}
can be transmitted through the
channel. Some channels, called constrained channels,
forbid certain sequences of symbols from being trans-
mitted.
Teletypewriters and telegraphy are two simple exam-
ples of discrete channels that are historically important.
In
the teletypewriter case, there are
32
symbols, each of
the same duration, and any sequence of the
32
symbols
is allowed. Each symbol can be used to represent five
bits of information. If the system transmits
r
symbols
per second, it is natural to say that the channel has a
capacity of
5r
bits per second. This does not mean that
the teletypewriter channel will always be transmitting
information at this rate. Whether or not the actual rate
reaches this maximum possible rate depends
on
how the
source of information is connected to the channel.
For the telegraphy channel, convention has fixed the
symbols as a dot, a dash, a letter space, and a word
space. We formalize these symbols as follows:
(1)
a dot,
consisting of line closed for one unit of time and then
line open for one unit
of
time;
(2)
a dash, consisting of
three time units of closure and one unit open;
(3)
a letter
space, consisting of three time units of line open;
(4)
a
word space, consisting of six time units of line open.
We also impose the restrictions on allowable sequences
that
no
space may directly follow another space. This
we take as the formal definition
of
the telegraphy
channel.
The Morse code is one system of encoding informa-
tion for this channel. However, one may properly
question the efficiency of the Morse code. Is there a
limit
on
the information that can be conveyed through
the telegraphy channel, and does the Morse code
achieve this limit? These questions are answered by
information theory.
Capacity
of
Discrete Noiseless
Channels
The capacity,
C
(in units of bits/second), of a
discrete noiseless channel is defined by