 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
>>> Haskell really is a PITA to parse. I mean, think about it:
>>> - Whitespace is significant. (No context-free parsing here!)
>>
>> I don't think "context-free" means what you think it means. Whitespace
>> being significant isn't "context".
>
> Indentation is significant. To parse the next line, you must know how
> far the current line was indented. Hence, "context".
I don't think "context-free" means what you think it means.
That's like saying Pascal isn't context-free because you need to know how
many "begin" statements you're nested within.
A grammar is context-free if none of the left side of the production have
terminal symbols in them. (That's the technical definition, which takes
several paragraphs to explain in prose.)
> No, they can be used *before* they're defined. And they can be defined
> in a completely seperate source file too. Which makes me wonder how it's
> possible to parse an individual file at all... you'd need to potentially
> parse every module it imports to work out the operator precedences!
Could be! :-) Sounds like a real bear to compile, but then, that's what
computers are for.
--
Darren New, San Diego CA, USA (PST)
"We'd like you to back-port all the changes in 2.0
back to version 1.0."
"We've done that already. We call it 2.0."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>>>> Haskell really is a PITA to parse. I mean, think about it:
>>>> - Whitespace is significant. (No context-free parsing here!)
>>>
>>> I don't think "context-free" means what you think it means.
>>> Whitespace being significant isn't "context".
>>
>> Indentation is significant. To parse the next line, you must know how
>> far the current line was indented. Hence, "context".
>
> I don't think "context-free" means what you think it means.
>
> That's like saying Pascal isn't context-free because you need to know
> how many "begin" statements you're nested within.
No, you don't. You can totally parse the contents without knowing how
deeply nested it is. The parsing rules don't change.
However, in Haskell, if I write
foo = if x
then y
else z
that's different to
foo =
if x
then y
else z
To parse the "then" part, you *must* know how far the "if" part is
indented. It's not possible to unambiguously parse without this
information. (Consider nested if-expressions; here indentation is the
only thing that makes the nesting unambiguous...)
> A grammar is context-free if none of the left side of the production
> have terminal symbols in them. (That's the technical definition, which
> takes several paragraphs to explain in prose.)
Yeah. You'd have to know what a "production" is, for starters...
>> No, they can be used *before* they're defined. And they can be defined
>> in a completely seperate source file too. Which makes me wonder how
>> it's possible to parse an individual file at all... you'd need to
>> potentially parse every module it imports to work out the operator
>> precedences!
>
> Could be! :-) Sounds like a real bear to compile, but then, that's what
> computers are for.
Well, you can totally tokenise the input without knowing anything about
what user-defined operators exist. (There are rules about what
characters you may use.) But it seems that you can't build a correct
parse tree until you know the operator precedences, so...
Starting to see why Haskell is the only language powerful enough to
implement a Haskell compiler, yes? ;-)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
>>>>> Haskell really is a PITA to parse. I mean, think about it:
>>>>> - Whitespace is significant. (No context-free parsing here!)
>>>>
>>>> I don't think "context-free" means what you think it means.
>>>> Whitespace being significant isn't "context".
>>>
>>> Indentation is significant. To parse the next line, you must know how
>>> far the current line was indented. Hence, "context".
>>
>> I don't think "context-free" means what you think it means.
>>
>> That's like saying Pascal isn't context-free because you need to know
>> how many "begin" statements you're nested within.
>
> No, you don't. You can totally parse the contents without knowing how
> deeply nested it is. The parsing rules don't change.
Of course it does.
if x = 0 then
y = 1;
y = 2;
parses completely different from
if x = 0 then begin
y = 1;
y = 2;
end
> However, in Haskell, if I write
>
> foo = if x
> then y
> else z
>
> that's different to
>
> foo =
> if x
> then y
> else z
>
> To parse the "then" part, you *must* know how far the "if" part is
> indented.
You only need to know if it's indented less, the same, or more than foo,
right? It would be the same if the "if" were under the = as if it were
under the "o", yes?
In which case you have three tokens. indent, outdent, and dent.
> It's not possible to unambiguously parse without this
> information. (Consider nested if-expressions; here indentation is the
> only thing that makes the nesting unambiguous...)
That doesn't mean it's context sensitive. That just means whitespace is
syntactically meaningful.
>> A grammar is context-free if none of the left side of the production
>> have terminal symbols in them. (That's the technical definition,
>> which takes several paragraphs to explain in prose.)
>
> Yeah. You'd have to know what a "production" is, for starters...
You know some BNF, right?
<expression> ::= <term> * <term> | <term> / <term>
Well, if you have non-terminals on the left of the ::=, you have a context
sensitive grammar.
>> Could be! :-) Sounds like a real bear to compile, but then, that's
>> what computers are for.
>
> Well, you can totally tokenise the input without knowing anything about
> what user-defined operators exist. (There are rules about what
> characters you may use.) But it seems that you can't build a correct
> parse tree until you know the operator precedences, so...
And you can find the definitions without knowing the precidences, right? You
can distinguish between definitions that use user-defined operators and
those which don't?
--
Darren New, San Diego CA, USA (PST)
"We'd like you to back-port all the changes in 2.0
back to version 1.0."
"We've done that already. We call it 2.0."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> However, in Haskell, if I write
>>
>> foo = if x
>> then y
>> else z
>>
>> that's different to
>>
>> foo =
>> if x
>> then y
>> else z
>>
>> To parse the "then" part, you *must* know how far the "if" part is
>> indented.
>
> You only need to know if it's indented less, the same, or more than foo,
> right?
Yeah.
> In which case you have three tokens. indent, outdent, and dent.
The above expressions use the "layout rule". You can, however, write
Haskell code where whitespace is not significant. (At least, not
significant in the same way it's not significant in Pascal; it still
delimits tokens, etc.) According to the language spec, Haskell with
layout can be transformed into Haskell without layout.
For example,
foo x =
case x of
1 -> "one"
2 -> "two"
3 -> "three"
would become
foo x = case x of {1 -> "one"; 2 -> "two"; 3 -> "three"}
I think perhaps the best thing would be to perform this transformation
first, and only then attempt to do the actual parsing. The explicitly
delimited stuff looks a damn site easier to parse.
Of course, the fun part is now trying to relate the location of any
parse errors back to the original source code. Indeed, with everything
I've done so far, it seems that making it *work* is relatively easy, but
making it give useful error messages if the user makes a mistake is
drastically harder.
(Want to type-check something? That's easy. Just unify a few terms.
What, it doesn't unify, and you want to know where the error is? Uh...
good luck with that.)
>> Well, you can totally tokenise the input without knowing anything
>> about what user-defined operators exist. (There are rules about what
>> characters you may use.) But it seems that you can't build a correct
>> parse tree until you know the operator precedences, so...
>
> And you can find the definitions without knowing the precidences, right?
> You can distinguish between definitions that use user-defined operators
> and those which don't?
Technically *all* operators are user-defined. (The only exception is the
list syntax, which - for reasons unknown - is hard-wired into the
language.) But sure, you need to hunt down the various definitions for
parsing, type checking, execution... whatever it is you want to do. It's
just a question of how your particular tool does it.
The only significant point is that this prevents you writing a
stand-alone Haskell parser; it can't be done. You can't parse Haskell
without comprehending it.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
> Awesome. On further investigation, it turns out my understanding of
> Haskell's type system is actually incorrect. And, unsurprisingly, my
> program is incorrect too; it rejects expressions that it should actually
> accept.
Well, I have now implemented let-bound variables correctly. (I hope!)
Still a few bugs to iron out. It *is* impressive though just how much
the simpleton program I've writen can figure out automatically, just by
looking at the shape of an expression. I mean, I can write expressions
involving library functions where I haven't told my program what their
type signatures are, and yet it still manages to figure it out for
itself, mostly...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> I think perhaps the best thing would be to perform this transformation
> first, and only then attempt to do the actual parsing. The explicitly
> delimited stuff looks a damn site easier to parse.
They're isomorphic. They are, by definition, exactly as easy to parse. :-)
But yes, the fact that you can go back and forth is one way to know it's not
"context sensitive". "Context sensitive" means something like "assignment
means something different if it's in the 'else' part of the 'if' instead of
the 'then' part." Sort of like the difference between lexical scoping and
textual scoping.
> Of course, the fun part is now trying to relate the location of any
> parse errors back to the original source code. Indeed, with everything
> I've done so far, it seems that making it *work* is relatively easy, but
> making it give useful error messages if the user makes a mistake is
> drastically harder.
Embed non-Haskell codes for line numbers.
[@1] foo x = [@2] case x of [@3] {1 -> "one"; [@4] 2 -> "two";
[@5] 3 -> "three"}
> The only significant point is that this prevents you writing a
> stand-alone Haskell parser; it can't be done. You can't parse Haskell
> without comprehending it.
Depends what you mean by "parse" and "comprehend", but yes, I know what you
mean. You can parse tokens, and you can only build a parse tree if you have
the definitions. But that's true of almost any language; it's certainly true
of any language where you'd talk about a parse tree. It's just that most of
those languages have hard-coded operators, precedences, etc. They're just in
a table in the compiler instead of in the source code.
You need to parse it into a map from symbol to definition, then figure out
what order to parse the definitions in, yes?
--
Darren New, San Diego CA, USA (PST)
"We'd like you to back-port all the changes in 2.0
back to version 1.0."
"We've done that already. We call it 2.0."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |