POV-Ray: Newsgroups: povray.off-topic: Really strange design choices: Re: Trixy

POV-Ray : Newsgroups : povray.off-topic : Really strange design choices : Re: Trixy		Server Time 12 Jul 2025 06:25:54 EDT (-0400)

From: Invisible
Date: 18 Dec 2008 10:30:33
Message: <494a6c99$1@news.povray.org>

>> I fail to see how a pattern matching language is of help here...
> 
> Well it seemed from your example, a "number" is quite easily 
> distinguished from a non-number.

Yeah, maybe.

> A number takes one of the four forms (where n is 1 or more digits):
> 
> n.n
> n.
> .n
> n
> 
> And is optionally prefixed by a minus sign, and optionally suffixed by 
> an exponential term, which takes the form E or e followed by an optional 
> minus sign followed by one or more digits.

This isn't quite correct.

- The optional sign prefix can also be "+" instead of "-" (in both the 
mantissa and any exponent there might be).
- Numbers may also take the form "n#n".

> I would use regular expressions to decide if my string matched this form 
> or not, but maybe your language/library already has similar functions to 
> do that?

Well, given that I already need to cut the string into bits anyway so I 
can modify it so the number parser will accept it, I'm not sure this 
buys me anything. (Haskell's number parser doesn't like "+" as a prefix, 
doesn't like ".7" or "7." as a number, and so forth.)

There is also a whole bunch of "interesting" rules about how token 
parsing works. A PostScript program can take an arbitrary text string 
and ask the interpretter to parse one token from it. Page 703 of the 
PostScript Language Reference Manual states the following facts:

- If the token read is a name object or a number object, and it is 
followed by a white-space character, one whitespace character is consumed.

- If the token ends with a delimiter that's part of the token, that 
delimiter is consumed, and no other characters after it.

- If the token is terminated by a delimiter that marks the start of the 
next token, that character is not consumed.

In other words, if you have "123 456" then the space is consumed, but if 
you have "<123> 456" then the space is *not* consumed. Likewise, if you 
have "123/abc" then the "/" is not consumed. However, "123abc" is a 
single (name object) token.

Looking at all these facts, it appears that the interpretter actually 
uses some simple rule to break the whole input stream into "tokens", and 
then decides what kind of token it is seperately.

I am now reimplementing my parser so that instead of trying to classify 
and split the input at the same time, it splits it first, and only then 
attempts to decide what it just read. I think this is probably how the 
"real" PostScript interpretters work.

Post a reply to this message