POV-Ray: Newsgroups: povray.off-topic: Very long post: Re: Very long post

POV-Ray : Newsgroups : povray.off-topic : Very long post : Re: Very long post		Server Time 3 Jul 2025 09:06:54 EDT (-0400)
From: Invisible
Date: 23 Sep 2008 12:00:21
Message: <48d91295$1@news.povray.org>
Darren New wrote:

> I read it all the way through too. Nicely written, good explanation.

Thanks. :-)

> I'd 
> add a bit at the end that points to the online resources you'd need to 
> learn it more completely.  (A few spelling mistakes and a dropped word 
> or two also, but that's trivia. :-)

Given that I'm the only person here who knows how to read Haskell 
syntax, a link to the Parsec user manual seemed... a little pointless.

But in case anybody cares:

http://legacy.cs.uu.nl/daan/download/parsec/parsec.html

That's the user's guide. The (very terse!) API reference is here:

http://haskell.org/ghc/docs/latest/html/libraries/

This is all the semi-standard Haskell libraries. The one you're looking 
for is Text.ParserCombinators.Parsec. (It's split into several modules.)

>> The more "usual" approach is usually something based around a finite 
>> state machine - but as I just explained, Parsec works differently. 
>> (And indeed will parse things t FSM couldn't parse - but that you'll 
>> often need that.)
> 
> Actually, that's not *quite* right.
> 
> An FSM will parse what a regular expression parses, which doesn't 
> include (for example) matching nested brackets, because there's no 
> arbitrarily-large storage area to hold the count of nested brackets.
> 
> A "push-down automaton" (or PDA) is one common way of implementing a 
> parser. It's a FSM *with* a stack. This is the sort of thing YACC / 
> Bison implements. People use it because you can algorithmically generate 
> the FSM from the BNF grammar of the language you want to parse.
> 
> What Haskell is using is called a recursive-descent parser: the parser 
> basically uses the run-time stack in place of an explicit stack. It's 
> best for LALR(1) syntaxes, which means "Look Ahead Left to Right 1 
> token". I.e., a language you can parse by running strictly from left to 
> right, looking only one element ahead, so you don't need the "P_Try()" 
> function. Some languages (like Pascal) are explicitly designed this way 
> to make it easy to build a recursive descent parser.

Heh. What do *I* know about parser theory? ;-)

The point is, Parsec will parse just about anything, although as you say 
it "works best" for LALR(1) grammars. It's quite surprising how simple 
and easy to explain the library is - and yet it handles almost anything. 
(I don't know if I made it clear, but Parsec parsers are a state monad.)

>> 1. A "tokeniser" takes the input string and splits it into "tokens", 
>> possibly decorating them slightly. So now you have a flat stream of 
>> tokens instead of just characters.
> 
> The advantage of doing it this way is most tokens can be parsed with a 
> regular expression, which is much faster (and leads to *much* smaller 
> push-down state machines) than doing it with the same PDA as used by the 
> rest of the grammar.

Indeed, the Parsec manual says something similar.

>> Parsec allows you to parse and then tokenise, or do the whole thing in 
>> one pass. It's up to you. (I didn't mention it, but Parsec supports 
>> running parsers over lists of *anything*, not just characters. So you 
>> could in theory parse raw binary if you wanted...)
> 
> Or you could parse a list of tokens coming from the tokenizer, which is 
> what I imagine the authors were thinking. :-)

Precisely that, yes.
Post a reply to this message