 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Nicolas Alvarez wrote:
> Orchid XP v8 wrote:
>>> Oh. I *highly* suggest you put this up on your blog or your web site or
>>> something like that, for public consumption. Get comments from a lot of
>>> people, find jobs in Haskell and/or teaching because you can write like
>>> this, etc.
>> Heh. I could add it to my blog. Nobody on earth will ever actually read
>> it, but sure, I can add it. ;-)
>
> Does Haskell have a wiki?
Yeah. Two of 'em, in fact... (The old one that's slowly being closed
down, and the new one that's a tad sparse stilll.)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New wrote:
> Actually, that's not *quite* right.
>
> What Haskell is using is called a recursive-descent parser: the parser
> basically uses the run-time stack in place of an explicit stack. It's
> best for LALR(1) syntaxes, which means "Look Ahead Left to Right 1
> token". I.e., a language you can parse by running strictly from left to
> right, looking only one element ahead, so you don't need the "P_Try()"
> function. Some languages (like Pascal) are explicitly designed this way
> to make it easy to build a recursive descent parser.
OK, well since you seem to know something about this... using the
formalisms developed here, how would you go about parsing a grammar that
can consist of integers, variable names, and + - / * ( ), such that the
operators have the correct precidence?
I thought I had this working, but it seems that bracketing the
expressions in unusual but valid ways causes the parser to trip over in
inexplicable ways. :-(
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible <voi### [at] dev null> wrote:
> OK, well since you seem to know something about this... using the
> formalisms developed here, how would you go about parsing a grammar that
> can consist of integers, variable names, and + - / * ( ), such that the
> operators have the correct precidence?
I don't know if this has anything to do with the formalism in question,
but I used the following classical parsing algorithm in my function parser
library, which converts a function string into a series of bytecode commands
which use stack arithmetic. (The parser is able to parse the input with one
single pass, by only seeing one token at a time).
Basically you create one parsing function for each level of precedence.
You start from the lowest precedence and make each such function call the
function at the next precedence level. The highest precedence level
function is the function that parses one element in the expression.
Assuming that + and - have the same precedence and are parsed
left-to-right, you write the lowest precedence function like this
(C++'ish pseudocode):
parsePlusMinus(functionString)
{
parseMulDiv(functionString); // The next higher precedence function.
token = functionString.nextToken(); // Could be end-of-string
if(token == '+' or token == '-')
{
parseMulDiv(functionString);
if(token == '+') generateCommand(add);
else generateCommand(sub);
}
}
Basically this divides the function string into parts which are separated
with '+' or '-'. It parses these parts by calling the next higher precedence
function, and then generates the operator command which calculates the sum
or subtraction of those parts. (Since my parser uses stack arithmetic, it
means that these "parts" are values on top of the stack when the
parseMulDiv() function has done its job and returns.)
The next function is very similar:
parseMulDiv(functionString)
{
parseElement(functionString); // The next higher precedence function.
token = functionString.nextToken();
if(token == '*' or token == '/')
{
parseElement(functionString);
if(token == '*') generateCommand(mul);
else generateCommand(div);
}
}
Now in your syntax the next higher precedence function is the one
which parses a single element. But what about parentheses? Well, here's
the trick: An expression in parentheses *is* a single element: After its
contents have been parsed it generates one single value onto the stack,
exactly as a single element does. The trick is to parse its contents
recursively, as its own expression. That is, something like:
parseElement(functionString)
{
token = functionString.nextToken();
if(token == '(')
parsePlusMinus(functionString); // parse the inner expression
// When it returns, the ')' will have been read out already
else if(token == NUMBER)
generateCommand(push, NUMBER); // push the number onto the stack
else SyntaxError;
}
Of course this is highly simplified. For example, it doesn't perform any
syntax correctness checking (eg. that parentheses match). Those require a
bit more logic inside those functions. But you get the basic idea.
The beauty of this is that with these recursive functions you are able
to parse the function string by reading one token at a time, traversing
the string only once from beginning to end, without ever needing to go
back or perform any look-ahead, yet it still correctly handles operator
precedence and nested expressions in parentheses.
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> Darren New wrote:
>
>> Actually, that's not *quite* right.
>>
>> What Haskell is using is called a recursive-descent parser: the parser
>> basically uses the run-time stack in place of an explicit stack. It's
>> best for LALR(1) syntaxes, which means "Look Ahead Left to Right 1
>> token". I.e., a language you can parse by running strictly from left
>> to right, looking only one element ahead, so you don't need the
>> "P_Try()" function. Some languages (like Pascal) are explicitly
>> designed this way to make it easy to build a recursive descent parser.
>
> OK, well since you seem to know something about this... using the
> formalisms developed here, how would you go about parsing a grammar that
> can consist of integers, variable names, and + - / * ( ), such that the
> operators have the correct precidence?
Almost everyone calls these productions "Expressions" and "Terms" and
"Factors". So googling for "bnf expression term" yields
http://www.seas.upenn.edu/~cit596/notes/dave/bnf4.html
You can see the basic technique. An expression is terms multiplied
together, while a term is factors added together, and a factor is a
variable, a number, or an expression in parens. That gives you the
precidence and associativity you are looking for.
> I thought I had this working, but it seems that bracketing the
> expressions in unusual but valid ways causes the parser to trip over in
> inexplicable ways. :-(
Take the BNF and translate it directly into functions that match those
other parsers? It's been decades since I did such stuff, so I am not
really glib enough at this point to just roll it off the fingertips.
Googling for "recursive descent expression parse", I get
http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm
which covers it well. Note the line
"""
Not only is left recursion eliminated, but the G1 is unambiguous and
each choice can be made by looking at the next token in the input.
"""
which tells you it's possible to use simple LL(1) parsers to grok it.
The "recursive descent recognition" shows you how to do it with Haskell
parsers. The next section ("shunting yard") shows you how Yacc/Bison
builds its table, followed by essentially an implementation of the
run-time part of Yacc/Bison.
That's actually a pretty good overview page. :-)
HTH!
--
Darren New / San Diego, CA, USA (PST)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New wrote:
> An expression is terms multiplied
> together, while a term is factors added together
Are you sure you didn't get that backwards?
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Nicolas Alvarez wrote:
> Darren New wrote:
>> An expression is terms multiplied
>> together, while a term is factors added together
>
> Are you sure you didn't get that backwards?
No, I'm not. :-)
--
Darren New / San Diego, CA, USA (PST)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> I don't know if this has anything to do with the formalism in question,
> but I used the following classical parsing algorithm in my function parser
> library, which converts a function string into a series of bytecode commands
> which use stack arithmetic. (The parser is able to parse the input with one
> single pass, by only seeing one token at a time).
>
> Basically you create one parsing function for each level of precedence.
> You start from the lowest precedence and make each such function call the
> function at the next precedence level. The highest precedence level
> function is the function that parses one element in the expression.
> Now in your syntax the next higher precedence function is the one
> which parses a single element. But what about parentheses? Well, here's
> the trick: An expression in parentheses *is* a single element: After its
> contents have been parsed it generates one single value onto the stack,
> exactly as a single element does. The trick is to parse its contents
> recursively, as its own expression. That is, something like:
> The beauty of this is that with these recursive functions you are able
> to parse the function string by reading one token at a time, traversing
> the string only once from beginning to end, without ever needing to go
> back or perform any look-ahead, yet it still correctly handles operator
> precedence and nested expressions in parentheses.
This is basically what I did - and it worked exactly right 95% of the
time. But if you bracket your expressions in odd (but valid) ways, it
complains.
Oh well, maybe I just made a mistake somewhere... I'll go recheck the code.
It's about now I wish I had a step debugger! ;-)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> This is basically what I did - and it worked exactly right 95% of the
> time. But if you bracket your expressions in odd (but valid) ways, it
> complains.
Ah-HAH!
I know what I did wrong. I was looking for brackets at *every* level of
precidence. The correct behaviour is to look for them *only* at the
lowest precidence level. Otherwise one of the higher levels gobbles them
up, and then complains that the seperator is a lower-precidence
operator. Oops.
OK, now I can bracket my expressions in all sorts of weird ways and the
parser always seems to give me the correct answer. (But I've said that
before!) Hmm, I might have to see about writing a formal test harness to
check whether it "really is" working properly this time...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible <voi### [at] dev null> wrote:
> precidence
You keep using that word...
http://dictionary.reference.com/browse/precedence
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> Invisible <voi### [at] dev null> wrote:
>> precidence
>
> You keep using that word...
>
> http://dictionary.reference.com/browse/precedence
Grr... god damned screwy language! :-S
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |