POV-Ray: Newsgroups: povray.pov4.discussion.general: Random POV-Ray 4 SDL proposal, #1: Re: Random POV-Ray 4 SDL proposal, #1

POV-Ray : Newsgroups : povray.pov4.discussion.general : Random POV-Ray 4 SDL proposal, #1 : Re: Random POV-Ray 4 SDL proposal, #1		Server Time 12 Jul 2025 10:38:59 EDT (-0400)

From: clipka
Date: 10 Jun 2021 10:25:13
Message: <60c220c9@news.povray.org>

Am 10.06.2021 um 13:18 schrieb Mr:
> clipka <ano### [at] anonymousorg> wrote:
> 
>> Explicit ASCII alternatives, hard-baked into the language, would be a
>> must, IMO. As I mentioned, Unicode symbols would be syntactic sugar. The
>> ASCII constructs would be the real deal, while the Unicode symbols would
>> be considered shortcuts.
> 
> Okay, and do you confirm that such kind of things would have significant impact
> on parse time, like: linearly, if you divide the character amounts by two you
> get half parsing code?

No, parser performance is not that simple.

A good parser (which POV-Ray's old one is not by any stretch, and even 
the overhauled one is only a step on the way there) will just _scan_ the 
whole file once (i.e. identify start and end of each character sequence 
that look like a token at first glance - e.g. sequences that look like 
numbers, sequences that look like keywords or identifies, sequences that 
look like operands, etc.), _tokenize_ it once (i.e. translate those 
character sequences into internal numeric IDs, aka tokens), and from 
there on just juggle those IDs.

The next steps would be to either...

- walk through those tokens and "execute" them, implementing loops by 
processing the corresponding tokens over and over again; in this case 
processing the loops again and again would be the bottleneck.

- digest that token sequence even further, "compiling" it into something 
that can be executed so efficiently that it might have a chance to 
become negligibe compared to the time spent scanning and tokenizing; but 
to achieve that, the effort to bring it into this efficient form will 
itself outweigh the effort of scanning and tokenizing.

In either case, the genuinely time-consuming portions of parsing will 
work on a representation in which the number of characters comprising 
the keywords or operands will have become entirely irrelevant.

Post a reply to this message