POV-Ray: Newsgroups: povray.off-topic: Monads in C#

POV-Ray : Newsgroups : povray.off-topic : Monads in C#		Server Time 1 Jul 2025 05:10:01 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: scott
Subject: Re: C# monads
Date: 8 Apr 2013 04:02:01
Message: <51627979$1@news.povray.org>

> You'll be unsurprised to learn that I am currently about 40% of the way
> through implementing Parsec in C#. It turns out to be reasonably easy,
> although there are a few gotchas. (E.g., there is no C# function which
> both inserts an element into a list *and* returns the list. Which is
> annoying, since it seems you can't call void-functions from LINQ...)

It will be interesting to compare the speed with your original...

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: C# monads
Date: 8 Apr 2013 14:46:28
Message: <51631084$1@news.povray.org>

On 08/04/2013 09:02 AM, scott wrote:
>> You'll be unsurprised to learn that I am currently about 40% of the way
>> through implementing Parsec in C#.
>
> It will be interesting to compare the speed with your original...

For most parsing tasks, speed isn't really an issue. Usually you use 
parsers to parse human-written text, which is typically small enough 
that the time spent parsing is negligible compared to the time spent on 
subsequent processing.

(If you wanted to parse a 3GB XML file, you wouldn't use Parsec. You'd 
use one of the various XML-specific libraries which are tuned to do that 
specific job very efficiently.)

That said, I'm not tuning my C# code for speed at all, and I don't 
expect it to be particularly fast. Then again, Parsec defaults to using 
Haskell strings, which are very inefficient... It looks like being a 
tortoise race.

Post a reply to this message

From: scott
Subject: Re: C# monads
Date: 9 Apr 2013 03:14:25
Message: <5163bfd1$1@news.povray.org>

> For most parsing tasks, speed isn't really an issue. Usually you use
> parsers to parse human-written text, which is typically small enough
> that the time spent parsing is negligible compared to the time spent on
> subsequent processing.

I don't know a great deal about parsers (the most I've done is made one 
to calculate very simple sums, like (5+4)/2) but AIUI things like POV 
first convert the human readable text to binary symbols (removing any 
unnecessary formatting) and then parse the binary symbols rather than 
the raw text. IDK which part typically takes longer (I guess the first 
part as that involves a lot of string searching), but the time POV 
spends "parsing" even a medium-sized scene isn't trivial.

Saying that if you're writing something to do single-line equations then 
you probably don't need to worry :-) I thought it would just be an 
interesting real-world example if you've coded the same algorithm in 
both Haskell and C# to see the difference.

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: C# monads
Date: 9 Apr 2013 03:47:02
Message: <5163c776$1@news.povray.org>

On 09/04/2013 08:14 AM, scott wrote:
>> For most parsing tasks, speed isn't really an issue. Usually you use
>> parsers to parse human-written text, which is typically small enough
>> that the time spent parsing is negligible compared to the time spent on
>> subsequent processing.
>
> I don't know a great deal about parsers (the most I've done is made one
> to calculate very simple sums, like (5+4)/2) but AIUI things like POV
> first convert the human readable text to binary symbols (removing any
> unnecessary formatting) and then parse the binary symbols rather than
> the raw text.

In other words, "lexing" (AKA "tokenising") followed by "parsing".

> IDK which part typically takes longer (I guess the first
> part as that involves a lot of string searching), but the time POV
> spends "parsing" even a medium-sized scene isn't trivial.

That's because
1) The "parsing" time includes the time taken to expand macros, because 
these work at the token level rather than some higher-level abstraction.
2) The "big" scenes are ones which include huge meshes. These 
machine-generated constructs tend to be quite large.

> Saying that if you're writing something to do single-line equations then
> you probably don't need to worry :-) I thought it would just be an
> interesting real-world example if you've coded the same algorithm in
> both Haskell and C# to see the difference.

Yeah, why not...

Post a reply to this message

From: Warp
Subject: Re: C# monads
Date: 9 Apr 2013 09:34:31
Message: <516418e7@news.povray.org>

scott <sco### [at] scottcom> wrote:
> I don't know a great deal about parsers (the most I've done is made one 
> to calculate very simple sums, like (5+4)/2) but AIUI things like POV 
> first convert the human readable text to binary symbols (removing any 
> unnecessary formatting) and then parse the binary symbols rather than 
> the raw text.

POV-Ray doesn't convert the SDL file into any internal binary format.
It just parses the text and creates objects as needed.

Fast parsing (regardless of whether you are interpreting as you parse or
whether you are byte-compiling it) is a difficult problem, one I have some
experience with.

The fastest possible way of parsing a text file with human-readable syntax
is to load it into an array in memory (if the file is exceedingly large
you'll have to do it in sections, but that shouldn't slow it down in any
significant way) and then parse it in-place (ie. without pulling out any
parts of it or creating new objects just to parse the thing) in a linear
fashion (ie. you traverse the array from beginning to end, without ever
going back.)

It is possible to do this kind of parsing without any kind of recursion,
but it becomes a bit complicated. Basically you need a state machine (which
has been generated at compile time into your program), and usually this
means generated code. (I suppose you could write a state machine by hand,
but it quickly becomes complicated, especially if the syntax of the thing
you are parsing is complex.)

The easiest way of writing an efficient parser by hand is to implement
a recursive descent parser (which, as said, parses the input in-place
from the array where it has been read to.) A RDP allows parsing any amount
of nested blocks in the input syntax (eg. parentheses or other types of
nested blocks) at any nesting depth, without ever having to go back in the
input (ie. just traversing the input from beginning to end once.)

What you do with the parsed information depends on what you want, but eg.
byte-compiling it into a stack-based bytecode (that can be later interpreted
in a very efficient manner) is very easy with a RDP. You could, of course,
just interpret the input directly.

-- 
                                                          - Warp

Post a reply to this message

From: scott
Subject: Re: C# monads
Date: 9 Apr 2013 09:51:39
Message: <51641ceb@news.povray.org>

>> I don't know a great deal about parsers (the most I've done is made one
>> to calculate very simple sums, like (5+4)/2) but AIUI things like POV
>> first convert the human readable text to binary symbols (removing any
>> unnecessary formatting) and then parse the binary symbols rather than
>> the raw text.
>
> POV-Ray doesn't convert the SDL file into any internal binary format.
> It just parses the text and creates objects as needed.

The existence of tokenize.cpp in the source made me think otherwise. 
What does this file do then?

Post a reply to this message

From: Orchid Win7 v1
Subject: More category theory
Date: 20 Apr 2013 15:17:48
Message: <5172e9dc$1@news.povray.org>

On 09/04/2013 02:34 PM, Warp wrote:
> The fastest possible way of parsing a text file with human-readable syntax
> is to load it into an array in memory (if the file is exceedingly large
> you'll have to do it in sections, but that shouldn't slow it down in any
> significant way) and then parse it in-place (ie. without pulling out any
> parts of it or creating new objects just to parse the thing) in a linear
> fashion (ie. you traverse the array from beginning to end, without ever
> going back.)
>
> It is possible to do this kind of parsing without any kind of recursion,
> but it becomes a bit complicated. Basically you need a state machine (which
> has been generated at compile time into your program), and usually this
> means generated code. (I suppose you could write a state machine by hand,
> but it quickly becomes complicated, especially if the syntax of the thing
> you are parsing is complex.)

Interestingly, if your parser is a monad, then you can never transform 
it into a finite state machine. Monads are just "too flexible" to permit 
this.

To be concrete: I can write a single Parsec parser which parses a file 
consisting of a BNF grammar specification followed by text written in 
that grammar. In other words, a parser which dynamically constructs a 
new parser and then runs it. Obviously there is no hope of statically 
optimising such a thing.

In Haskell we use monads to perform I/O, and for that task this level of 
flexibility is exactly what is wanted. We *want* programs to change 
their behaviour depending the input they receive. But for the problem of 
parsing a fixed grammar, that's actually a little bit *too* flexible. 
Sometimes we would like something a bit less powerful, because often 
less is more with these things. (In this case, less expressive power 
equals more optimisation capabilities.)

The problem with monads is that the bind operation feeds the result of 
one step into some arbitrary Turing-complete code block which then 
generates the next parser to run. This Turing-completeness is the source 
of the trouble; it means you can't statically analyse what the parser is 
going to try to do (c.f. Rice's theorem).

There are alternatives, however. In particular, instead of using a monad 
to do parsing, you can use something called an "applicative functor". 
(No, I'm not kidding. That's what they call it.)

Every monad is also an applicative functor. However, not all applicative 
functors are also monads. For the purpose of parsing, it's the ones that 
aren't monads which are interesting.

A few definitions. In Haskell terms, a "functor" is basically anything 
that has a Map() function. You'll recall that Map() takes a list and a 
function, and applies that function to every element of the list, 
resulting in a new list. Well, in a slightly mind-binding kind of way, 
and I/O operation is *also* a functor, because you can take (say) an I/O 
operation that returns a byte, and Map() a function from bytes to 
characters over it, resulting in a new I/O operation that returns a 
character. And, indeed, *every* monad is also a functor in this way.

So much for functors. If you have something that's a functor, and all 
you know about it is "it's a functor", then that lets you Map() it, and 
that's all you can do. This isn't terribly interesting on its own.

Now, an *applicative* functor has two additional abilities. One is that 
it has a function like Inject(), which puts a value "into" the functor. 
The other is a function that I'm going to call Apply(). It's a bit like 
Map(). But whereas Map() takes a function and a functor containing a 
value [or values], Apply() takes a functor of functions and a functor of 
values.

For example, if Parser<X> is a parser that parses an X, then

   public Parser<T2> Apply(Parser<Func<T1, T2>> p1, Parser<T1> p2)

First p1 is run, then p2 is run, then the function returned by p1 is 
executed with the result from p2 as its argument. Whatever it returns, 
the composite parser returns.

My word, that seems mighty complicated. And why would you want a parser 
that returns a function anyway?

Well, Map() lets you apply a function to *one* argument. But what if you 
have (say) a 4-argument function, and you want to run 4 parsers to 
generate each of the inputs? Well, you can do

   Map(f, p1)

That partially applies our 4-argument function "f" to the first 
argument... But now we've got a parser that returns a 3-argument 
function. We're stuck; Map() wants a function, not a parser that returns 
a function.

Now if our parser forms a monad, this is where we invoke Bind() to chain 
in the next step. But that's actually overkill; all we want is to be 
able to take a function from a parser. And that's all Apply() does. So 
we can do this:

   temp1 = Apply(f, p1);
   temp2 = Apply(temp1, p2);
   temp3 = Apply(temp2, p3);
   temp4 = Apply(temp3, p4);

Now temp4 is a parser that parses 4 things and then applies f to the 
results. In Haskell, it's not called Apply(), it's called "<*>", which 
is an infix operator, so we can just write

   temp4 = pure f <*> p1 <*> p2 <*> p3 <*> f4

which looks almost the same as

   result = f v1 v2 v3 v4

but with a bunch of <*> operators inserted between arguments. There's 
also a "<*" operator, which allows us to run a parser and skip its output:

   parser = pure f <*> arg1_parser <*> arg2_parser <* separator_parser 
<*> arg3_parser

This, by itself, doesn't allow us to parse alternatives. (But then, 
neither does a monad by itself.) We need extra combinators to handle 
that, and a few other things which are specific to parsers.

Using this framework, the parser can be represented not as an opaque 
block of Turing-complete code, but as a transparent data structure which 
can be operated on by other functions. This means you can do run-time or 
compile-time optimisations on it. In particular, you could write a 
function that takes a parser and generates a finite state machine from it.

The *other* possibility is to use something called an "arrow", but 
that's for another post...

Post a reply to this message

From: Warp
Subject: Re: More category theory
Date: 20 Apr 2013 17:31:47
Message: <51730943@news.povray.org>

I'm assuming that implementing a recursive descent parser ought to be
relatively simple in Haskell, given how recursive in nature functional
programming is. (Of course this also assumes that it's easy and efficient
in Haskell to have the input as a byte array, and it's efficient to read
said array byte by byte, without much copying things around.)

An interesting thing about this type of parsing, though, is how to parse
things like numeric literals (especially if they are not integers) or
UTF-8 encoded characters.

Most languages have some kind of standard functionality for parsing the
former. The only thing one has to do is to detect when such a literal is
appearing in the input, and then call said function. (The C/C++ standard
libraries in particular offer functions to parse numeric values that are
in ASCII format in-place, without having to first copy them anywhere,
which is a major help when writing such efficient parsers.)

Parsing UTF-8 encoded characters is a bit trickier, especially if the
language does not offer standard utility functions for that purpose.
The most efficient way of doing that is to, basically, write a small
state machine that reads the input byte by byte until a character not
in a valid sequence appears, and returns how many bytes the found name
consists of. (During this parsing, or afterwards, you can then do with
those characters whatever you need to.)

(Remember that we are writing an extremely *efficient* parser here. That
means no copying characters around and parsing them separately. They have
to be parsed in-place, where they appear in the byte array.)

-- 
                                                          - Warp

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: More category theory
Date: 24 Apr 2013 03:46:52
Message: <51778dec$1@news.povray.org>

On 20/04/2013 10:31 PM, Warp wrote:
> I'm assuming that implementing a recursive descent parser ought to be
> relatively simple in Haskell, given how recursive in nature functional
> programming is.

Indeed.

> (Of course this also assumes that it's easy and efficient
> in Haskell to have the input as a byte array, and it's efficient to read
> said array byte by byte, without much copying things around.)

*Reading* from an array is easy. You only need to copy things if they 
change, and reading from something doesn't change it.

> Most languages have some kind of standard functionality for parsing the
> former.

The trouble starts when the literals you're trying to parse don't match 
the format that the standard function expects. (E.g., some languages 
allow "+7" as a valid integer, and others don't.)

> Parsing UTF-8 encoded characters is a bit trickier, especially if the
> language does not offer standard utility functions for that purpose.

Haskell provides a way to read from a file, decoding UTF-8 on the fly. 
I'm not sure how it's implemented internally though. You end up with a 
packed array of characters which you can iterate over.

> (Remember that we are writing an extremely *efficient* parser here. That
> means no copying characters around and parsing them separately. They have
> to be parsed in-place, where they appear in the byte array.)

You're probably going to struggle to get this level of efficiency from a 
programming language that uses garbage collection. Even if all the data 
is in a flat array, the array might get copied from place to place 
occasionally by the garbage collector.

(Having said that, I'm not sure whether or not Haskell's GC does any 
special handling for "large" objects. Some GCs handle these specially, 
which usually means storing them in a separate area that doesn't get 
scanned as often, so they don't need to be copied as much.)

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: More category theory
Date: 24 Apr 2013 13:38:06
Message: <5178187e$1@news.povray.org>

On 24/04/2013 08:46 AM, Orchid Win7 v1 wrote:
> Having said that, I'm not sure whether or not Haskell's GC does any
> special handling for "large" objects.

It does. It takes various steps to try to avoid copying objects larger 
than about 4KB.

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages