|
 |
Invisible wrote:
I read it all the way through too. Nicely written, good explanation. I'd
add a bit at the end that points to the online resources you'd need to
learn it more completely. (A few spelling mistakes and a dropped word
or two also, but that's trivia. :-)
> scott wrote:
>
>> I read it, very interesting seeing as I had no idea how a parser works
>> (I always wondered how you would go about writing one).
>
> The more "usual" approach is usually something based around a finite
> state machine - but as I just explained, Parsec works differently. (And
> indeed will parse things t FSM couldn't parse - but that you'll often
> need that.)
Actually, that's not *quite* right.
An FSM will parse what a regular expression parses, which doesn't
include (for example) matching nested brackets, because there's no
arbitrarily-large storage area to hold the count of nested brackets.
A "push-down automaton" (or PDA) is one common way of implementing a
parser. It's a FSM *with* a stack. This is the sort of thing YACC /
Bison implements. People use it because you can algorithmically generate
the FSM from the BNF grammar of the language you want to parse.
What Haskell is using is called a recursive-descent parser: the parser
basically uses the run-time stack in place of an explicit stack. It's
best for LALR(1) syntaxes, which means "Look Ahead Left to Right 1
token". I.e., a language you can parse by running strictly from left to
right, looking only one element ahead, so you don't need the "P_Try()"
function. Some languages (like Pascal) are explicitly designed this way
to make it easy to build a recursive descent parser.
> 1. A "tokeniser" takes the input string and splits it into "tokens",
> possibly decorating them slightly. So now you have a flat stream of
> tokens instead of just characters.
The advantage of doing it this way is most tokens can be parsed with a
regular expression, which is much faster (and leads to *much* smaller
push-down state machines) than doing it with the same PDA as used by the
rest of the grammar.
> Parsec allows you to parse and then tokenise, or do the whole thing in
> one pass. It's up to you. (I didn't mention it, but Parsec supports
> running parsers over lists of *anything*, not just characters. So you
> could in theory parse raw binary if you wanted...)
Or you could parse a list of tokens coming from the tokenizer, which is
what I imagine the authors were thinking. :-)
--
Darren New / San Diego, CA, USA (PST)
Post a reply to this message
|
 |