
< previous page page_106 next page >
Page 106
A subset of is said to be
ultimately periodic
if it is the union of a finite set and a finite number of
arithmetic progressions all having the same period. The above theorem can therefore be stated in the
following terms: the recognisable subsets of the natural numbers are precisely the ultimately periodic
ones.
5.3 Kleene’s theorem: algorithms
In this section, we shall describe two algorithms that together provide an algorithmic proof of Kleene’s
theorem: our first algorithm will show explicitly how to construct an
ε
-automaton from a regular
expression, and our second will show explicitly how to construct a regular expression from an
automaton.
In the proof below we shall use a class of
ε
-automata. A
normalised ε-automaton
is just an
ε
-automaton
having exactly one initial state and one terminal state, and the property that there are no transitions
into the initial state or out of the terminal state.
Theorem 5.3.1 (Regular expression to
ε
-automaton)
Let r be a regular expression over the
alphabet A. Let m be the sum of the following two numbers: the number of symbols from A occurring in
r, counting repeats, and the number of regular operators occurring in r, counting repeats. Then there is
an ε-automaton
A
having at most
2
m states such that L
(A)=
L
.
Proof We shall prove that each regular language is recognised by some normalised
ε
-automaton
satisfying the conditions of the theorem. Base step: prove that if
L
=
L(r)
where
r
is a regular expression
without regular operators, then
L
can be recognised by a normalised
ε
-automaton with at most 2 states.
However, in this case
L
is either
{a}
where
, ,
or
{ε}
. The normalised
ε
-automata, which
recognise each of these languages, are
Induction hypothesis: assume that if
r
is a regular expression, using at most
n
−1 regular operators and
containing
p
occurrences of letters from the underlying alphabet, then
L(r)
can be recognised by a
normalised
ε
-automaton using at most 2(
n
−1)+2
p
states. Now let
r
be a regular expression having
n
regular operators and
q
occurrences of letters from the underlying alphabet. We shall prove that
L(r)
can
be recognised by a normalised
ε
-automaton containing at most 2
n
+2
q
states. From the definition of a
regular expression,
r
must have one of the following three forms: (1)
r
=
s
+
t,
(2)
r
=
s
·
t
or (3)
r
=
s
*.
Clearly,
s
and
t
each use at most
n
−1 regular operators; let
ns
and
nt
be the number of regular
operators occurring in
s
and
t,
respectively, and let
qs
and
qt
be the number of occurrences of letters
from the underlying alphabet in
s
and
t,
respectively. Then
ns
+
nt
=
n
−1 and
qs
+
qt
=
q
. So by the
induction hypothesis
L(s)
and
L(t)
are recognised by normalised
ε
-automata A and B, respectively,
which have at most 2(
ns
+
qs
) and 2(
nt
+
qt
) states apiece. We can picture these as follows:
< previous page page_106 next page >