
CHAPTER 8 ■ PARSERS—BECAUSE BNF IS NOT JUST FOR ACADEMICS ANYMORE
235
Combinators
“What’s a parser combinator?” you ask.
1
A combinator is a function that takes only other
functions as parameters and returns only functions. Combinators allow you to combine
small functions into big functions. In the case of the parser combinator library, you can
combine small functions that match individual characters or small groups of characters
into bigger functions that can parse complex documents.
So, you have input that is a
Seq[Char] (sequence of characters), and you want to parse
the stream, which will either contain
t, r, u, e or f, a, l, s, e—true or false. So, we would
express such a program as
def parse = (elem('t') ~ elem('r') ~ elem('u') ~ elem('e')) |
(elem('f') ~ elem('a') ~ elem('l') ~ elem('s') ~ elem('e'))
where the elem method returns a subclass of Function[Seq[Char], ParseResult[Char]] that
also has
~ and | methods. The first call to elem returns a function that will attempt to match
the first character in an input stream to the letter “t.” If the first letter of the input stream
matches, then the function returns
Parsers.Success; otherwise it returns a Parsers.NoSuccess.
The
~ method is called “and then,” so we can read the first part as t and then r and then u
and then
e. So, elem('t') ~ elem('r') returns another one of these special Function[Seq[Char],
ParseResult[List[Char]]] things. So we combine the functions together with the ~ method
into one bigger function. We keep doing this with each successive
~ method invocation. The
following code:
elem('t') ~ elem('r') ~ elem('u') ~ elem('e')
builds a single function that will consult the characters t, r, u, e and return a
Parsers.Success[List[Char]] or a Parsers.NoSuccess if the input does not contain true.
The
| operator also takes two of these combinated function thingies and combines
them into a single function thingy that will test the first clause,
true, and if that succeeds,
its value is returned, but if it does not succeed, then the second clause,
false, is tried. Let’s
call that function thingy a
Parser. So, we can combine these Parser instances with each
other into other
Parser instances using operators like “and then,” “or else,” and so on. We
can combine little
Parsers into big Parsers using logic and thus construct complex gram-
mars out of little building blocks.
Let’s use a little bit of Scala’s implicit functionality to make the definition of our grammar
easier. Scala’s parser combinator library has implicit conversions from
Char into
Parser[Char], so we can write
def p2 = ('t' ~ 'r' ~ 'u' ~ 'e') |
('f' ~ 'a' ~ 'l' ~ 's' ~ 'e')
1. As in the rest of this book, here I’m dealing with the practicalities of combinators. There is a lot
of theory and math behind combinators. This Wikipedia article touches on them: http://
en.wikipedia.org/wiki/Combinator.
19897ch08.fm Page 235 Wednesday, April 22, 2009 3:54 PM