POV-Ray: Newsgroups: povray.off-topic: Pointless but satisfying

POV-Ray : Newsgroups : povray.off-topic : Pointless but satisfying		Server Time 6 Jul 2025 00:46:03 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Darren New
Subject: Re: Pointless but satisfying
Date: 17 Jul 2009 13:13:46
Message: <4a60b14a$1@news.povray.org>

Orchid XP v8 wrote:
>>> Haskell really is a PITA to parse. I mean, think about it:
>>> - Whitespace is significant. (No context-free parsing here!)
>>
>> I don't think "context-free" means what you think it means. Whitespace 
>> being significant isn't "context".
> 
> Indentation is significant. To parse the next line, you must know how 
> far the current line was indented. Hence, "context".

I don't think "context-free" means what you think it means.

That's like saying Pascal isn't context-free because you need to know how 
many "begin" statements you're nested within.

A grammar is context-free if none of the left side of the production have 
terminal symbols in them.  (That's the technical definition, which takes 
several paragraphs to explain in prose.)

> No, they can be used *before* they're defined. And they can be defined 
> in a completely seperate source file too. Which makes me wonder how it's 
> possible to parse an individual file at all... you'd need to potentially 
> parse every module it imports to work out the operator precedences!

Could be! :-)  Sounds like a real bear to compile, but then, that's what 
computers are for.

-- 
   Darren New, San Diego CA, USA (PST)
   "We'd like you to back-port all the changes in 2.0
    back to version 1.0."
   "We've done that already. We call it 2.0."

Post a reply to this message

From: Invisible
Subject: Re: Pointless but satisfying
Date: 20 Jul 2009 04:11:14
Message: <4a6426a2$1@news.povray.org>

>>>> Haskell really is a PITA to parse. I mean, think about it:
>>>> - Whitespace is significant. (No context-free parsing here!)
>>>
>>> I don't think "context-free" means what you think it means. 
>>> Whitespace being significant isn't "context".
>>
>> Indentation is significant. To parse the next line, you must know how 
>> far the current line was indented. Hence, "context".
> 
> I don't think "context-free" means what you think it means.
> 
> That's like saying Pascal isn't context-free because you need to know 
> how many "begin" statements you're nested within.

No, you don't. You can totally parse the contents without knowing how 
deeply nested it is. The parsing rules don't change.

However, in Haskell, if I write

   foo = if x
           then y
           else z

that's different to

   foo =
     if x
       then y
       else z

To parse the "then" part, you *must* know how far the "if" part is 
indented. It's not possible to unambiguously parse without this 
information. (Consider nested if-expressions; here indentation is the 
only thing that makes the nesting unambiguous...)

> A grammar is context-free if none of the left side of the production 
> have terminal symbols in them.  (That's the technical definition, which 
> takes several paragraphs to explain in prose.)

Yeah. You'd have to know what a "production" is, for starters...

>> No, they can be used *before* they're defined. And they can be defined 
>> in a completely seperate source file too. Which makes me wonder how 
>> it's possible to parse an individual file at all... you'd need to 
>> potentially parse every module it imports to work out the operator 
>> precedences!
> 
> Could be! :-)  Sounds like a real bear to compile, but then, that's what 
> computers are for.

Well, you can totally tokenise the input without knowing anything about 
what user-defined operators exist. (There are rules about what 
characters you may use.) But it seems that you can't build a correct 
parse tree until you know the operator precedences, so...

Starting to see why Haskell is the only language powerful enough to 
implement a Haskell compiler, yes? ;-)

Post a reply to this message

From: Darren New
Subject: Re: Pointless but satisfying
Date: 20 Jul 2009 14:46:32
Message: <4a64bb88@news.povray.org>

Invisible wrote:
>>>>> Haskell really is a PITA to parse. I mean, think about it:
>>>>> - Whitespace is significant. (No context-free parsing here!)
>>>>
>>>> I don't think "context-free" means what you think it means. 
>>>> Whitespace being significant isn't "context".
>>>
>>> Indentation is significant. To parse the next line, you must know how 
>>> far the current line was indented. Hence, "context".
>>
>> I don't think "context-free" means what you think it means.
>>
>> That's like saying Pascal isn't context-free because you need to know 
>> how many "begin" statements you're nested within.
> 
> No, you don't. You can totally parse the contents without knowing how 
> deeply nested it is. The parsing rules don't change.

Of course it does.

if x = 0 then
   y = 1;
   y = 2;

parses completely different from

if x = 0 then begin
   y = 1;
   y = 2;
   end

> However, in Haskell, if I write
> 
>   foo = if x
>           then y
>           else z
> 
> that's different to
> 
>   foo =
>     if x
>       then y
>       else z
> 
> To parse the "then" part, you *must* know how far the "if" part is 
> indented. 

You only need to know if it's indented less, the same, or more than foo, 
right?  It would be the same if the "if" were under the = as if it were 
under the "o", yes?

In which case you have three tokens.  indent, outdent, and dent.

 > It's not possible to unambiguously parse without this
> information. (Consider nested if-expressions; here indentation is the 
> only thing that makes the nesting unambiguous...)

That doesn't mean it's context sensitive. That just means whitespace is 
syntactically meaningful.

>> A grammar is context-free if none of the left side of the production 
>> have terminal symbols in them.  (That's the technical definition, 
>> which takes several paragraphs to explain in prose.)
> 
> Yeah. You'd have to know what a "production" is, for starters...

You know some BNF, right?

<expression> ::= <term> * <term> | <term> / <term>

Well, if you have non-terminals on the left of the ::=, you have a context 
sensitive grammar.

>> Could be! :-)  Sounds like a real bear to compile, but then, that's 
>> what computers are for.
> 
> Well, you can totally tokenise the input without knowing anything about 
> what user-defined operators exist. (There are rules about what 
> characters you may use.) But it seems that you can't build a correct 
> parse tree until you know the operator precedences, so...

And you can find the definitions without knowing the precidences, right? You 
can distinguish between definitions that use user-defined operators and 
those which don't?

-- 
   Darren New, San Diego CA, USA (PST)
   "We'd like you to back-port all the changes in 2.0
    back to version 1.0."
   "We've done that already. We call it 2.0."

Post a reply to this message

From: Invisible
Subject: Re: Pointless but satisfying
Date: 21 Jul 2009 04:20:40
Message: <4a657a58$1@news.povray.org>

>> However, in Haskell, if I write
>>
>>   foo = if x
>>           then y
>>           else z
>>
>> that's different to
>>
>>   foo =
>>     if x
>>       then y
>>       else z
>>
>> To parse the "then" part, you *must* know how far the "if" part is 
>> indented. 
> 
> You only need to know if it's indented less, the same, or more than foo, 
> right?

Yeah.

> In which case you have three tokens.  indent, outdent, and dent.

The above expressions use the "layout rule". You can, however, write 
Haskell code where whitespace is not significant. (At least, not 
significant in the same way it's not significant in Pascal; it still 
delimits tokens, etc.) According to the language spec, Haskell with 
layout can be transformed into Haskell without layout.

For example,

   foo x =
     case x of
       1 -> "one"
       2 -> "two"
       3 -> "three"

would become

   foo x = case x of {1 -> "one"; 2 -> "two"; 3 -> "three"}

I think perhaps the best thing would be to perform this transformation 
first, and only then attempt to do the actual parsing. The explicitly 
delimited stuff looks a damn site easier to parse.

Of course, the fun part is now trying to relate the location of any 
parse errors back to the original source code. Indeed, with everything 
I've done so far, it seems that making it *work* is relatively easy, but 
making it give useful error messages if the user makes a mistake is 
drastically harder.

(Want to type-check something? That's easy. Just unify a few terms. 
What, it doesn't unify, and you want to know where the error is? Uh... 
good luck with that.)

>> Well, you can totally tokenise the input without knowing anything 
>> about what user-defined operators exist. (There are rules about what 
>> characters you may use.) But it seems that you can't build a correct 
>> parse tree until you know the operator precedences, so...
> 
> And you can find the definitions without knowing the precidences, right? 
> You can distinguish between definitions that use user-defined operators 
> and those which don't?

Technically *all* operators are user-defined. (The only exception is the 
list syntax, which - for reasons unknown - is hard-wired into the 
language.) But sure, you need to hunt down the various definitions for 
parsing, type checking, execution... whatever it is you want to do. It's 
just a question of how your particular tool does it.

The only significant point is that this prevents you writing a 
stand-alone Haskell parser; it can't be done. You can't parse Haskell 
without comprehending it.

Post a reply to this message

From: Invisible
Subject: Re: Pointless but satisfying
Date: 21 Jul 2009 11:13:29
Message: <4a65db19$1@news.povray.org>

Orchid XP v8 wrote:

> Awesome. On further investigation, it turns out my understanding of 
> Haskell's type system is actually incorrect. And, unsurprisingly, my 
> program is incorrect too; it rejects expressions that it should actually 
> accept.

Well, I have now implemented let-bound variables correctly. (I hope!) 
Still a few bugs to iron out. It *is* impressive though just how much 
the simpleton program I've writen can figure out automatically, just by 
looking at the shape of an expression. I mean, I can write expressions 
involving library functions where I haven't told my program what their 
type signatures are, and yet it still manages to figure it out for 
itself, mostly...

Post a reply to this message

From: Darren New
Subject: Re: Pointless but satisfying
Date: 21 Jul 2009 11:46:15
Message: <4a65e2c7$1@news.povray.org>

Invisible wrote:
> I think perhaps the best thing would be to perform this transformation 
> first, and only then attempt to do the actual parsing. The explicitly 
> delimited stuff looks a damn site easier to parse.

They're isomorphic. They are, by definition, exactly as easy to parse. :-)

But yes, the fact that you can go back and forth is one way to know it's not 
"context sensitive".  "Context sensitive" means something like "assignment 
means something different if it's in the 'else' part of the 'if' instead of 
the 'then' part."  Sort of like the difference between lexical scoping and 
textual scoping.

> Of course, the fun part is now trying to relate the location of any 
> parse errors back to the original source code. Indeed, with everything 
> I've done so far, it seems that making it *work* is relatively easy, but 
> making it give useful error messages if the user makes a mistake is 
> drastically harder.

Embed non-Haskell codes for line numbers.

[@1] foo x = [@2] case x of [@3] {1 -> "one"; [@4] 2 -> "two";
    [@5] 3 -> "three"}

> The only significant point is that this prevents you writing a 
> stand-alone Haskell parser; it can't be done. You can't parse Haskell 
> without comprehending it.

Depends what you mean by "parse" and "comprehend", but yes, I know what you 
mean. You can parse tokens, and you can only build a parse tree if you have 
the definitions. But that's true of almost any language; it's certainly true 
of any language where you'd talk about a parse tree. It's just that most of 
those languages have hard-coded operators, precedences, etc. They're just in 
a table in the compiler instead of in the source code.

You need to parse it into a map from symbol to definition, then figure out 
what order to parse the definitions in, yes?

-- 
   Darren New, San Diego CA, USA (PST)
   "We'd like you to back-port all the changes in 2.0
    back to version 1.0."
   "We've done that already. We call it 2.0."

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages