POV-Ray: Newsgroups: povray.text.scene-files: Filed() macro for CSV data file handling: Re: Filed() macro for CSV data file handling

POV-Ray : Newsgroups : povray.text.scene-files : Filed() macro for CSV data file handling : Re: Filed() macro for CSV data file handling		Server Time 24 Apr 2024 07:45:39 EDT (-0400)

From: William F Pokorny
Date: 31 Oct 2021 10:47:09
Message: <617eac6d$1@news.povray.org>

On 10/30/21 7:10 PM, William F Pokorny wrote:
> A big chunk of the time is in the scanner.cpp code qualify bits and 
> pieces of potential tokens. As to how to make that all 'significantly' 
> faster within the current framework - no luck thus far. We'll see.

OK. I chased one pocket of scanner expense by changing from a set of 
conditional tests for character classification to a boolean array[256] 
with predetermined answers. I wanted to see how much movement / 
improvement I could get. It was a gain of about 3% on the write side and 
%2 percent on the read.

I think I see my way clear to folding up to eight character 
classifications in the space the boolean array is actually taking. This 
would allow us to change some(all?) of the 'if else if... else' 
conditional chains to a more direct switch construct. A WILD guess is 
we'd at most gain 10-15% total (2/3% above being in that savings).

If the guess about right, it would not be  a game changer - but the 
relative ease of implementation is there. I haven't decided whether to 
attempt the change in my povr branch.

Random thinking.
---------------

- In the end we are working character by character in the parser and 
this is expensive. Further, we are repeatedly classifying characters and 
tokens.

- The non-pure ascii (<=255) character encoding costs. An advantage utf8 
encoding for the SDL has over others is it's smaller and so faster.

- Relatedly, Christoph is using an internal to POV-Ray utf8 class based 
upon std::string. This is common practice with c++ programs and it's 
safe. It's the case though that as we walk that utf8 class by character, 
incrementing pointers the length checking is costing a few percent of 
the overall run time. Well... at least the profiling indicates this 
length checking cost.

- My updates above are related to the newest version of the parser and 
not the older one in v3.8 beta*.

- The newer parser changes in v4.0 and povr are substantial and they 
were only the initial push for what Christoph ultimately had in mind. 
Testing of those changes showed improved performance. I'm not, however, 
sure what all else was intended. Heck, to a degree the new parser is 
still new-ish to me. Until now, I've only dug around in the parser code 
to try and fix particular bugs.

- On staring at the profile results now over some days, I have few wild 
parser ideas banging around in my head for the parser. Unfortunately, 
most all are not that easy to just code up and try.

- A reminder the v4.0 / povr parsing is only faster than v3.8 beta* for 
the test cases in this thread - if a collection of parser asserts is 
turned off (see configparser.h). Christoph was actively working on the 
parser in late 2018 and early 2019. These would have eventually been 
turned off for normal release compiles.

Bill P.

Post a reply to this message