POV-Ray: Newsgroups: povray.off-topic: Java: Some things never change: Re: Java: Some things never change

POV-Ray : Newsgroups : povray.off-topic : Java: Some things never change : Re: Java: Some things never change		Server Time 29 Jul 2024 06:19:01 EDT (-0400)
From: Invisible
Date: 2 Apr 2012 07:06:28
Message: <4f798834$1@news.povray.org>
On 02/04/2012 10:07 AM, Invisible wrote:
> Yesterday I was sitting at my PC, listening to Alisha's Attic and coding
> in Java.
>
> So... doing what I was doing 10 years ago, then. :-P

It's interesting, actually. I needed a parser. Since Java doesn't have 
any parsing libraries, I started writing one. But I made the mistake of 
trying to write it the way you'd write in in Haskell. Pro tip: DON'T DO 
THAT!

You see, in Haskell, it's trivial to write a function that takes some 
other function as its argument to do some key operation, or to implement 
the continuation, or whatever. But in Java, that's wicked-hard. You'd 
have to write an entire class for every "function" you want to pass as a 
parameter. (The fact that "anonymous inner classes" even exist tells you 
how badly some Java programmers want first-class functions...)

In Haskell, you'd write a parser combinator library. You have a couple 
of primitive parsers, and then you have functions that construct more 
sophisticated parsers from simpler ones. Trouble is, in Java every 
"parser" has to be an entire class. And where in Haskell you can just 
say "foo 1 2 3" and get a 1-argument function, in Java you must say

   public class Foo extends Parser
   {
     private int arg1, arg2, arg3;

     public Foo(int a1, int a2, int a3)
     {
       this.arg1 = a1;
       this.arg2 = a2;
       this.arg3 = a3;
     }
   }

This is before you even /implement/ the actual functionality of the 
thing! In short, Java makes this kind of thing excruciatingly verbose. 
Also, unlike Haskell, it's not designed to be used this way at all, and 
so it doesn't optimise all this overhead away. I ended up writing a huge 
mountain of very complex and buggy code that's a nightmare to understand.

So then I took a step back, and tried to write a parser the way a Java 
programmer might do it. And you know what? The lexer is pretty simple. 
You do something like

   public class Lexer
   {
     private String buffer;
     private int    index;

     public Lexer(String in)
     {
       this.buffer = in;
       this.index  = 0;
     }

     private char Peek() {return this.buffer.chatAt(this.Index);}

     private void Next() {this.index++;}

     private boolean EOF() {return this.Index < this.buffer.length();}

     public Token GetNextToken() {...}
   }

There's basically two types of token: Ones that consist of just one 
special character, and ones that consist of a run of characters of a 
certain type. In Haskell, you'd abstract out these patterns, but in Java 
that's too painful. The single-character case is trivial to handle, and 
there are only two types of character run to worry about, so the code 
repartition isn't too bad.

Next, you need to parse the tokens into a parse tree. Now there's 
several ways to do that. The way I tried to do it was Dijkstra's 
shunting algorithm. In it's original form, it takes a list of tokens and 
builds a Polish reversed list. But it's easy to modify it to build a 
parse tree instead.

The original algorithm looks like

   - Read token.
   - If operand, output.
   - If operator, push to the operator stack. (Possibly unstacking and 
outputting some existing operators first.)
   - At the end, unstack all remaining operators as above.

All you gotta do is keep the operands on a stack as well, and when you 
unstack an operator, you pull its operands off the operand stack, make a 
little expression tree out of it, and push that back to the operand 
stack. At the end, there should be one operand left, which is your 
expression tree.

The really tricky part is handling brackets and function calls. In fact, 
I discovered that my parser fails for functions that take zero 
arguments. It appears the only way to fix this is for the parser to 
"know" how many arguments each function is supposed to take - which is a 
dependence I really don't want to add. (The alternative is to hack 
around the problem with status flags, which is a bit messy.)

Apart from empty function calls, my parser works. Trouble is, every 
possible operator token becomes a whole class, because every token 
behaves differently. (In particular, most tokens need to be turned into 
/expressions/ at some point.) So you end up with a rather large 
profusion of classes!

I ended up with 12 concrete token classes, plus about 5 abstract classes 
to handle shared functionality. Add on one class for the parser (which 
manages the two stacks), one class for the lexer, and one class for the 
parser exception (in Java, an exception is a class) and we have a grand 
total of 20 classes to implement this simple little parser. And it 
doesn't even give decent error messages!
Post a reply to this message