POV-Ray: Newsgroups: povray.beta-test: New version of new tokenizer: Re: New version of new tokenizer

POV-Ray : Newsgroups : povray.beta-test : New version of new tokenizer : Re: New version of new tokenizer		Server Time 12 Jul 2025 08:54:30 EDT (-0400)

From: clipka
Date: 1 Jun 2018 13:29:15
Message: <5b11826b$1@news.povray.org>

Am 01.06.2018 um 10:35 schrieb clipka:

> - signature BOM in utf8-encoded files currently not supported

And now that one has also been addressed:

https://github.com/POV-Ray/povray/releases/tag/v3.8.0-x.tokenizer.9686180

And once again this re-implementation comes with improvements over v3.7:

- The v3.7 implementation simply swallowed any contiguous sequence of
non-ASCII bytes at the start of a scene file, and just /presumed/ them
to be an UTF-8 signature BOM. The v3.8.0-x.tokenizer implementation
actually checks whether the non-ASCII byte sequence matches the UTF-8
signature BOM.

- The v3.7 implementation only covered the main scene file. The new
implementation extends to include files as well.


This leaves only two known issues:

- Performance of loops invoking macros: I will not address this for now,
as it does not seem to impede functionality, and the root cause will
most likely be eliminated anyway when I implement token-level loop caching.

- Non-ASCII characters in string literals: This I will also set aside
for now, until I get a clearer picture of whether the current
scene-global `charset` mechanism is even used to any extent worth
supporting, as I think it may be easier and cleaner to throw it
overboard (or at least ditch the `sys` setting) in favour of a per-file
mechanism.

Post a reply to this message