POV-Ray: Newsgroups: povray.beta-test: Restructured Parser wants Testing: Re: Restructured Parser wants Testing

POV-Ray : Newsgroups : povray.beta-test : Restructured Parser wants Testing : Re: Restructured Parser wants Testing		Server Time 2 Jul 2025 09:13:06 EDT (-0400)

From: clipka
Date: 24 May 2018 10:54:45
Message: <5b06d235$1@news.povray.org>

Am 23.05.2018 um 10:22 schrieb Stephen:
> On 22/05/2018 01:28, clipka wrote:
>> - Backward compatibility with scenes that use single backslashes in
>> literal filenames has been sacrificed, and will most likely not be
>> restored.
> 
> This is going to cause me problems using SDL generated by modellers.

I'm zeroing in on the conclusion that this is a problem that's not
easily solved in the new parser architecture, especially with respect to
the future plans.

The aim is to scan/tokenize a file only once, cache the result, and from
then on only operate on the resulting token stream. Maybe even have a
separate scanner/tokenizer thread go ahead with its job while the parser
thread is still busy with earlier portions of the scene.

This means that whatever looks identical, must be scanned/tokenized
identically. Most notably, backslashes in string literals must be
treated consistently: "foo\nbar" must always be translated as the
character sequence `foo` followed by a newline followed by the character
sequence `bar`, no matter whether the parser - based on neighboring
tokens - expects a filename or a generic string at that particular location.

-OR- the scanner/tokenizer would always have to translate such a string
literal as the character sequence `foo\nbar`. But then the job of
resolving the escape sequence would fall to the parser proper, and would
have to be done over and over again each time the parser encounters the
string literal. Since strings are comparatively heavyweight to process,
that would be bad news in terms of performance.

A similar problem exists with respect to character encoding: Without
understanding of `global_settings { charset utf8 }`, the
scanner/tokenizer cannot decide whether a non-ASCII byte sequence in a
string literal should be interpreted according to UTF-8 or according to
Windows-1252, Latin-1 or whatever.

Post a reply to this message