POV-Ray : Newsgroups : povray.off-topic : Really strange design choices Server Time
6 Sep 2024 17:22:54 EDT (-0400)
  Really strange design choices (Message 11 to 20 of 37)  
<<< Previous 10 Messages Goto Latest 10 Messages Next 10 Messages >>>
From: Invisible
Subject: Re: Trixy
Date: 18 Dec 2008 08:40:41
Message: <494a52d9$1@news.povray.org>
>> Other amusing edge cases include "/":
>>
>> - A name is usually executable; by preceeding it with "/", it becomes 
>> literal.
>>
>> - The toke "/" by itself (i.e., not preceeding a name) is a valid 
>> (executable) name.
>>
>> Trixy Hobbitses!
> 
> Also fun is trying to write a correct number parser:
> 
> - ".0" and "0." are both real number objects (equal to 0.0).
> 
> - "." by itself is a name object.
> 
> - PostScript allows both "-" and "+" as sign prefixes (which is good). 
> Haskell does not, however (which is bad).

Ah, but these interact!

Anything that isn't parsable as a number is a name. Therefore,

"0."    -> real
".0"    -> real
"."     -> name
"1.1"   -> real
"1.1.1" -> name
"1e1"   -> real
"1x1"   -> name
"s1"    -> name
"1s"    -> name

Will the insanity never end?? >_<

Good luck writing a parser that can untangle all of that... :-(


Post a reply to this message

From: scott
Subject: Re: Trixy
Date: 18 Dec 2008 09:09:49
Message: <494a59ad$1@news.povray.org>
> Anything that isn't parsable as a number is a name. Therefore,
> 
> "0."    -> real
> ".0"    -> real
> "."     -> name
> "1.1"   -> real
> "1.1.1" -> name
> "1e1"   -> real
> "1x1"   -> name
> "s1"    -> name
> "1s"    -> name
> 
> Will the insanity never end?? >_<
> 
> Good luck writing a parser that can untangle all of that... :-(

http://xkcd.com/208/


Post a reply to this message

From: Invisible
Subject: Re: Trixy
Date: 18 Dec 2008 09:12:16
Message: <494a5a40$1@news.povray.org>
scott wrote:

> http://xkcd.com/208/

Seriously... You gotta love the way this guy manages to draw stick 
figers that have no facial expressions, yet you can tell *exactly* what 
emotion they're having! o_O

Also... Yes, I am very, very glad I'm not writing a parser for regular 
expressions. (My God, think of the massacre...!)


Post a reply to this message

From: scott
Subject: Re: Trixy
Date: 18 Dec 2008 09:19:34
Message: <494a5bf6$1@news.povray.org>
>> http://xkcd.com/208/
>
> Seriously... You gotta love the way this guy manages to draw stick figers 
> that have no facial expressions, yet you can tell *exactly* what emotion 
> they're having! o_O
>
> Also... Yes, I am very, very glad I'm not writing a parser for regular 
> expressions. (My God, think of the massacre...!)

I meant using regular expressions to help in your parser to decipher 
numbers.


Post a reply to this message

From: Invisible
Subject: Re: Trixy
Date: 18 Dec 2008 09:27:59
Message: <494a5def$1@news.povray.org>
>> Also... Yes, I am very, very glad I'm not writing a parser for regular 
>> expressions. (My God, think of the massacre...!)
> 
> I meant using regular expressions to help in your parser to decipher 
> numbers.

I fail to see how a pattern matching language is of help here...

(I already *have* a real parser construction toolkit. The *problem* is 
that the rules I'm trying to puzzle out are quite complex - and not 
fantastically well-documented.)

Still, sooner or later I'll reach this stage:

http://xkcd.com/349/


Post a reply to this message

From: scott
Subject: Re: Trixy
Date: 18 Dec 2008 09:52:48
Message: <494a63c0@news.povray.org>
>> I meant using regular expressions to help in your parser to decipher 
>> numbers.
>
> I fail to see how a pattern matching language is of help here...

Well it seemed from your example, a "number" is quite easily distinguished 
from a non-number.

A number takes one of the four forms (where n is 1 or more digits):

n.n
n.
.n
n

And is optionally prefixed by a minus sign, and optionally suffixed by an 
exponential term, which takes the form E or e followed by an optional minus 
sign followed by one or more digits.

I would use regular expressions to decide if my string matched this form or 
not, but maybe your language/library already has similar functions to do 
that?


Post a reply to this message

From: Invisible
Subject: Re: Trixy
Date: 18 Dec 2008 10:30:33
Message: <494a6c99$1@news.povray.org>
>> I fail to see how a pattern matching language is of help here...
> 
> Well it seemed from your example, a "number" is quite easily 
> distinguished from a non-number.

Yeah, maybe.

> A number takes one of the four forms (where n is 1 or more digits):
> 
> n.n
> n.
> .n
> n
> 
> And is optionally prefixed by a minus sign, and optionally suffixed by 
> an exponential term, which takes the form E or e followed by an optional 
> minus sign followed by one or more digits.

This isn't quite correct.

- The optional sign prefix can also be "+" instead of "-" (in both the 
mantissa and any exponent there might be).
- Numbers may also take the form "n#n".

> I would use regular expressions to decide if my string matched this form 
> or not, but maybe your language/library already has similar functions to 
> do that?

Well, given that I already need to cut the string into bits anyway so I 
can modify it so the number parser will accept it, I'm not sure this 
buys me anything. (Haskell's number parser doesn't like "+" as a prefix, 
doesn't like ".7" or "7." as a number, and so forth.)

There is also a whole bunch of "interesting" rules about how token 
parsing works. A PostScript program can take an arbitrary text string 
and ask the interpretter to parse one token from it. Page 703 of the 
PostScript Language Reference Manual states the following facts:

- If the token read is a name object or a number object, and it is 
followed by a white-space character, one whitespace character is consumed.

- If the token ends with a delimiter that's part of the token, that 
delimiter is consumed, and no other characters after it.

- If the token is terminated by a delimiter that marks the start of the 
next token, that character is not consumed.

In other words, if you have "123 456" then the space is consumed, but if 
you have "<123> 456" then the space is *not* consumed. Likewise, if you 
have "123/abc" then the "/" is not consumed. However, "123abc" is a 
single (name object) token.

Looking at all these facts, it appears that the interpretter actually 
uses some simple rule to break the whole input stream into "tokens", and 
then decides what kind of token it is seperately.

I am now reimplementing my parser so that instead of trying to classify 
and split the input at the same time, it splits it first, and only then 
attempts to decide what it just read. I think this is probably how the 
"real" PostScript interpretters work.


Post a reply to this message

From: Warp
Subject: Re: Trixy
Date: 18 Dec 2008 11:21:03
Message: <494a786f@news.povray.org>
Invisible <voi### [at] devnull> wrote:
> Anything that isn't parsable as a number is a name. Therefore,

> "0."    -> real
> ".0"    -> real
> "."     -> name
> "1.1"   -> real
> "1.1.1" -> name
> "1e1"   -> real
> "1x1"   -> name
> "s1"    -> name
> "1s"    -> name

> Will the insanity never end?? >_<

> Good luck writing a parser that can untangle all of that... :-(

  I really can't see the problem. When the input contains a sequence of
valid characters (ie. which can form a real or a name), if this sequence
has the form:

    ^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$

then it's a real, else it's a name.

  If we translate that regexp to plain English, it means:

- There's nothing before this pattern (which is what the ^ at the beginning
  means), and nothing after it (which is what the $ at the end means).
- The sequence optionally starts with a + or a -.
- After that two possible patterns must appear (the expression in
  parentheses, where the two patterns are separated with the | symbol):
  - A sequence of one of more digits ([0-9]+), optionally followed by
    the dot character (a plain "." has a special meaning in regexps, so the
    dot character has to be escaped, and thus written as "\.")
  - A sequence of zero or more digits ([0-9]*) followed by a dot character
    followed by a sequence of one of more digits.
- Optionally the character "e" can follow, and if that's the case, a real
  (not containing an "e") must follow as well (the whole last part in
  parentheses, with the "?" at the end to indicate optionality).

  In an actual BNF-style parser the rule probably becomes simpler because
the repetition can be removed.

-- 
                                                          - Warp


Post a reply to this message

From: Invisible
Subject: Re: Trixy
Date: 18 Dec 2008 11:36:58
Message: <494a7c2a@news.povray.org>
Warp wrote:
>> Good luck writing a parser that can untangle all of that... :-(
> 
>   I really can't see the problem. When the input contains a sequence of
> valid characters (ie. which can form a real or a name), if this sequence
> has the form:
> 
>     ^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$
> 
> then it's a real, else it's a name.

Yeah. As I said, I'm currently changing my design from one that attempts 
to recognise and delimit numbers to one that just chops the text into 
chunks, and *then* decides what kind of thing each chunk is.

>   If we translate that regexp to plain English, it means:
> 
> - There's nothing before this pattern (which is what the ^ at the beginning
>   means), and nothing after it (which is what the $ at the end means).
> - The sequence optionally starts with a + or a -.
> - After that two possible patterns must appear (the expression in
>   parentheses, where the two patterns are separated with the | symbol):
>   - A sequence of one of more digits ([0-9]+), optionally followed by
>     the dot character (a plain "." has a special meaning in regexps, so the
>     dot character has to be escaped, and thus written as "\.")
>   - A sequence of zero or more digits ([0-9]*) followed by a dot character
>     followed by a sequence of one of more digits.
> - Optionally the character "e" can follow, and if that's the case, a real
>   (not containing an "e") must follow as well (the whole last part in
>   parentheses, with the "?" at the end to indicate optionality).

...which would be incorrect then, for at least the following reasons:

- There can be *zero* or more characters before the decimal point. (But 
notice that there must be more than zero characters *in total* before 
and after the decimal point. It's just that there can be zero in either 
place, but not both.)

- The "e" can also be "E".

- The exponent is an integer, not a real.

See? Not as easy as it looks, is it? Gotta pay careful attention to 
*exactly* what the manual says is and isn't permissible.

>   In an actual BNF-style parser the rule probably becomes simpler because
> the repetition can be removed.

It would be *really nice* if the reference manual included a BNF syntax 
diagram... :-S

Between the rules for splitting up tokens, the tricky rules for escaping 
things in strings, and characters that have multiple meanings depending 
on context, it's really quite hard!

E.g., "<" is the start of a string, "<~" is the start of another string, 
and "<<" is an ordinary name object. Go figure. Similarly, "[" is 
classified as a "delimiter character", yet it's also the *name* of an 
operator (and a name is a sequence of "regular characters" - that is, 
can't contain delimiters).

It all gets confusing very fast...


Post a reply to this message

From: Eero Ahonen
Subject: Re: Really strange design choices
Date: 18 Dec 2008 11:55:24
Message: <494a807c@news.povray.org>
Orchid XP v8 wrote:
> 
> Why...why...WHY...why would they do this? o_O
> 

Why not? After all, it's possible.

-Aero


Post a reply to this message

<<< Previous 10 Messages Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.