POV-Ray : Newsgroups : povray.off-topic : Monads in C# : More category theory Server Time
29 Jul 2024 02:29:16 EDT (-0400)
  More category theory  
From: Orchid Win7 v1
Date: 20 Apr 2013 15:17:48
Message: <5172e9dc$1@news.povray.org>
On 09/04/2013 02:34 PM, Warp wrote:
> The fastest possible way of parsing a text file with human-readable syntax
> is to load it into an array in memory (if the file is exceedingly large
> you'll have to do it in sections, but that shouldn't slow it down in any
> significant way) and then parse it in-place (ie. without pulling out any
> parts of it or creating new objects just to parse the thing) in a linear
> fashion (ie. you traverse the array from beginning to end, without ever
> going back.)
>
> It is possible to do this kind of parsing without any kind of recursion,
> but it becomes a bit complicated. Basically you need a state machine (which
> has been generated at compile time into your program), and usually this
> means generated code. (I suppose you could write a state machine by hand,
> but it quickly becomes complicated, especially if the syntax of the thing
> you are parsing is complex.)

Interestingly, if your parser is a monad, then you can never transform 
it into a finite state machine. Monads are just "too flexible" to permit 
this.

To be concrete: I can write a single Parsec parser which parses a file 
consisting of a BNF grammar specification followed by text written in 
that grammar. In other words, a parser which dynamically constructs a 
new parser and then runs it. Obviously there is no hope of statically 
optimising such a thing.

In Haskell we use monads to perform I/O, and for that task this level of 
flexibility is exactly what is wanted. We *want* programs to change 
their behaviour depending the input they receive. But for the problem of 
parsing a fixed grammar, that's actually a little bit *too* flexible. 
Sometimes we would like something a bit less powerful, because often 
less is more with these things. (In this case, less expressive power 
equals more optimisation capabilities.)

The problem with monads is that the bind operation feeds the result of 
one step into some arbitrary Turing-complete code block which then 
generates the next parser to run. This Turing-completeness is the source 
of the trouble; it means you can't statically analyse what the parser is 
going to try to do (c.f. Rice's theorem).

There are alternatives, however. In particular, instead of using a monad 
to do parsing, you can use something called an "applicative functor". 
(No, I'm not kidding. That's what they call it.)

Every monad is also an applicative functor. However, not all applicative 
functors are also monads. For the purpose of parsing, it's the ones that 
aren't monads which are interesting.

A few definitions. In Haskell terms, a "functor" is basically anything 
that has a Map() function. You'll recall that Map() takes a list and a 
function, and applies that function to every element of the list, 
resulting in a new list. Well, in a slightly mind-binding kind of way, 
and I/O operation is *also* a functor, because you can take (say) an I/O 
operation that returns a byte, and Map() a function from bytes to 
characters over it, resulting in a new I/O operation that returns a 
character. And, indeed, *every* monad is also a functor in this way.

So much for functors. If you have something that's a functor, and all 
you know about it is "it's a functor", then that lets you Map() it, and 
that's all you can do. This isn't terribly interesting on its own.

Now, an *applicative* functor has two additional abilities. One is that 
it has a function like Inject(), which puts a value "into" the functor. 
The other is a function that I'm going to call Apply(). It's a bit like 
Map(). But whereas Map() takes a function and a functor containing a 
value [or values], Apply() takes a functor of functions and a functor of 
values.

For example, if Parser<X> is a parser that parses an X, then

   public Parser<T2> Apply(Parser<Func<T1, T2>> p1, Parser<T1> p2)

First p1 is run, then p2 is run, then the function returned by p1 is 
executed with the result from p2 as its argument. Whatever it returns, 
the composite parser returns.

My word, that seems mighty complicated. And why would you want a parser 
that returns a function anyway?

Well, Map() lets you apply a function to *one* argument. But what if you 
have (say) a 4-argument function, and you want to run 4 parsers to 
generate each of the inputs? Well, you can do

   Map(f, p1)

That partially applies our 4-argument function "f" to the first 
argument... But now we've got a parser that returns a 3-argument 
function. We're stuck; Map() wants a function, not a parser that returns 
a function.

Now if our parser forms a monad, this is where we invoke Bind() to chain 
in the next step. But that's actually overkill; all we want is to be 
able to take a function from a parser. And that's all Apply() does. So 
we can do this:

   temp1 = Apply(f, p1);
   temp2 = Apply(temp1, p2);
   temp3 = Apply(temp2, p3);
   temp4 = Apply(temp3, p4);

Now temp4 is a parser that parses 4 things and then applies f to the 
results. In Haskell, it's not called Apply(), it's called "<*>", which 
is an infix operator, so we can just write

   temp4 = pure f <*> p1 <*> p2 <*> p3 <*> f4

which looks almost the same as

   result = f v1 v2 v3 v4

but with a bunch of <*> operators inserted between arguments. There's 
also a "<*" operator, which allows us to run a parser and skip its output:

   parser = pure f <*> arg1_parser <*> arg2_parser <* separator_parser 
<*> arg3_parser

This, by itself, doesn't allow us to parse alternatives. (But then, 
neither does a monad by itself.) We need extra combinators to handle 
that, and a few other things which are specific to parsers.

Using this framework, the parser can be represented not as an opaque 
block of Turing-complete code, but as a transparent data structure which 
can be operated on by other functions. This means you can do run-time or 
compile-time optimisations on it. In particular, you could write a 
function that takes a parser and generates a finite state machine from it.

The *other* possibility is to use something called an "arrow", but 
that's for another post...


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.