POV-Ray : Newsgroups : povray.binaries.images : Reading things differently. Server Time
19 Apr 2024 05:21:02 EDT (-0400)
  Reading things differently. (Message 5 to 14 of 14)  
<<< Previous 4 Messages Goto Initial 10 Messages
From: William F Pokorny
Subject: Re: Reading things differently.
Date: 18 May 2018 07:11:37
Message: <5afeb4e9$1@news.povray.org>
On 05/17/2018 01:06 PM, clipka wrote:
> Am 17.05.2018 um 16:06 schrieb clipka:
>> Am 17.05.2018 um 12:49 schrieb clipka:
>>
>>> Performance is another issue, with parsing of the benchmark scene still
>>> taking about 40% longer, but I'm optimistic that I can trim that down
>>> quite a bit.
>>
>> ... and already close to normal performance again: Difference on
>> benchmark scene is now within measurement tolerances, while a 45M token
>> mesh2-heavy scene is now just 8% slower than with the current GitHub
>> master version.
> 
> Well, it gets better and better: I'm already 7% /faster/ now. And I
> haven't even started on the optimizations I originally had in mind. So
> far it's more or less a byproduct of the new dedicated scanner and
> related refactoring.
> 
> I guess linear scenes won't get much improvement on top of this (at
> least not as long as the parser remains single-threaded), but I'm pretty
> sure I can do something about non-linear scenes.
> 

Guessing the magnitude of linear improvement will depend on the SDL 
writer's characteristic verbosity(1)?

Bill P.

(1) - I'll be able to crank it to 11... :-)


Post a reply to this message

From: clipka
Subject: Re: Reading things differently.
Date: 18 May 2018 08:21:58
Message: <5afec566@news.povray.org>
Am 18.05.2018 um 13:11 schrieb William F Pokorny:

>> I guess linear scenes won't get much improvement on top of this (at
>> least not as long as the parser remains single-threaded), but I'm pretty
>> sure I can do something about non-linear scenes.
>>
> 
> Guessing the magnitude of linear improvement will depend on the SDL
> writer's characteristic verbosity(1)?

Absolutely. Every character you type (including whitespace) is a
character the scanner has to scan. So minimizing your WCV will continue
to be good advice(*).

(*for getting top parsing speeds; legibility is another matter)


Post a reply to this message

From: Kenneth
Subject: Re: Reading things differently.
Date: 18 May 2018 15:50:02
Message: <web.5aff2d616db5d976a47873e10@news.povray.org>
clipka <ano### [at] anonymousorg> wrote:
> Am 18.05.2018 um 13:11 schrieb William F Pokorny:
> >
> > Guessing the magnitude of linear improvement will depend on the SDL
> > writer's characteristic verbosity(1)?
>
> Absolutely. Every character you type (including whitespace) is a
> character the scanner has to scan. So minimizing your WCV will continue
> to be good advice(*).
>
> (*for getting top parsing speeds; legibility is another matter)

Pity the poor SDL programmer who includes copious comments in his scene (and
long variable names) just to make sense of his or her complex scene code. Uh,
like me :-/

I guess the best recourse is to wait until the scene is 'sealed in stone', then
make a copy of it for 'real' rendering, stripping out ALL unnecessary stuff
(including as much whitespace as possible), to get a lean and efficient parse.
(I'm thinking mostly of a complex animation scene.) I assume that this would be
good practice in ANY high-level language.


Post a reply to this message

From: Bald Eagle
Subject: Re: Reading things differently.
Date: 18 May 2018 16:20:01
Message: <web.5aff34836db5d976c437ac910@news.povray.org>
I'm still looking this over and trying to construct some test cases worthy of
posting.  I've got everything in reservedwords.cpp to play with, so that ought
to keep me busy once I get a chance to actually `work on it.


Great job, Christoph, getting the scanner up to speed (and beyond!)


Also:

"Kenneth" <kdw### [at] gmailcom> wrote:
> clipka <ano### [at] anonymousorg> wrote:

> > Absolutely. Every character you type (including whitespace) is a
> > character the scanner has to scan.

> Pity the poor SDL programmer ...

Well, something I would like added to SDL is at least one other way to tag
comments, which would be displayed in a different hue.
Because it would certainly be useful to be able to highlight some very important
text in a long /*   ...   */ comment block

Along those lines, perhaps if there were an "expert mode" comment tag, then the
scanner could just ignore the rest of everything on that line, or perhaps
something like #comment(100) would treat the following hundred lines as
comments....


Post a reply to this message

From: Kenneth
Subject: Re: Reading things differently.
Date: 18 May 2018 16:30:01
Message: <web.5aff370e6db5d976a47873e10@news.povray.org>
"Kenneth" <kdw### [at] gmailcom> wrote:

> >
> > Absolutely. Every character you type (including whitespace) is a
> > character the scanner has to scan.

For sheer parse-time efficiency, I'm wondering if the number of separate code
lines in a scene also makes a difference. In other words, is this more
efficient...

#declare S=32+64+17;

..... than this:

#declare S=
32
+
64
+
17
;

Does the scanner have to 'scan' the ENTER key's entry, when starting new
lines like this?


Post a reply to this message

From: clipka
Subject: Re: Reading things differently.
Date: 18 May 2018 22:56:11
Message: <5aff924b@news.povray.org>
Am 18.05.2018 um 21:45 schrieb Kenneth:
> clipka <ano### [at] anonymousorg> wrote:
>> Am 18.05.2018 um 13:11 schrieb William F Pokorny:
>>>
>>> Guessing the magnitude of linear improvement will depend on the SDL
>>> writer's characteristic verbosity(1)?
>>
>> Absolutely. Every character you type (including whitespace) is a
>> character the scanner has to scan. So minimizing your WCV will continue
>> to be good advice(*).
>>
>> (*for getting top parsing speeds; legibility is another matter)
> 
> Pity the poor SDL programmer who includes copious comments in his scene (and
> long variable names) just to make sense of his or her complex scene code. Uh,
> like me :-/

Don't worry - hand-made (and thus documented) /linear/ scenes are rarely
ever long enough to have any noticeable parsing time; and when I'm done
with the parser improvement, in /non-linear/ scenes the WCV will make
virtually no difference anymore, as any comments and whitespace will
then be processed (and discarded) exactly once, even in loops and macros.


> I guess the best recourse is to wait until the scene is 'sealed in stone', then
> make a copy of it for 'real' rendering, stripping out ALL unnecessary stuff
> (including as much whitespace as possible), to get a lean and efficient parse.
> (I'm thinking mostly of a complex animation scene.) I assume that this would be
> good practice in ANY high-level language.

No; with good contemporary high-level languages, WCV makes little
difference, so comments should always stay in there (unless your primary
goal is not to improve parsing speed, but to deliberately obfuscate the
code).

Pretty much the only thing that's slowed down by high WCV is the scanner
stage; and unless a language is implemented as daft as POV-Ray's, that
stage makes for only a very tiny fraction of the time required to
process a program's source code.


Post a reply to this message

From: clipka
Subject: Re: Reading things differently.
Date: 19 May 2018 00:31:29
Message: <5affa8a1$1@news.povray.org>
Am 18.05.2018 um 22:16 schrieb Bald Eagle:

> Well, something I would like added to SDL is at least one other way to tag
> comments, which would be displayed in a different hue.
> Because it would certainly be useful to be able to highlight some very important
> text in a long /*   ...   */ comment block

That would be a feature request for the editor component, which I refuse
to touch.

Use an external editor if this is of any importance to you.


> Along those lines, perhaps if there were an "expert mode" comment tag, then the
> scanner could just ignore the rest of everything on that line,

Such a thing already exists: It starts with `//`.

Really. "Just ignore the rest of everything on that line" is effectively
what the scanner does when encountering `//`-style single-line comments
- although technically it's "just scan and discard the next characters
until you encounter an end-of-line", but that's virtually as fast as it
gets.

The only thing faster would be a comment having just a single character
for its end tag (e.g. `/{ This is a new style of comment }`), because
the test for end of comment would literally be a test for one particular
character, while the end-of-line test must necessarily be a bit more
complicated due to the different line ending styles (LF, CR, CR+LF,
LF+CR). But that difference in performance would be minor, and probably
be offset by an increased complexity in testing for start-of-comment
sequences.

> or perhaps
> something like #comment(100) would treat the following hundred lines as
> comments....

Yikes. That would introduce a host of problems:

- Have fun debugging scenes making use of such a construct! (Forget
syntax highlighting; there are probably very few, if any, editors out
there that could support such a thing.)

- The scanner currently reports `#` and the subsequent directive as
separate tokens (note for instance that `#end` and `# end` are
equivalent); so the suggested syntax would require it to distinguish
between regular `#` and `#comment`, a feat which potentially involves
scanning ahead more than a single character (e.g. to distinguish it from
`#case`, or maybe even a future `#command`). A single-character
scan-ahead is fine, but anything beyond that would significantly
complicate the scanning algorithm.

OR the `#comment` construct would have to be handled by a later stage of
the parser, but that would be choosing cancer over polio:

- Processing the `#comment` statement would be far more "heavyweight"
than the other comment styles.

- With the scanner unaware of the peculiarities of `#` followed by
`comment`, the content of such a statement would have to be well-formed
in terms of the scanning rules; for instance, double quotes would have
to be balenced, as would have to be `/*` and `*/`.

- The scanner would have to report each end-of-line as a token (as
opposed to just treating it as whitespace and thus skipping it), so that
the downstream code could count the lines.

While all of these issues /could/ be worked around, the resulting code
would do nothing to improve parsing time: You'd still find the existing
comment styles to be faster, and the added complexity would further slow
down parsing in general.


In the current parser architecture, comments don't increase parsing
speed by virtue of being difficult to parse -- to the contrary, on a
per-character basis they are probably the most lightweight constructs in
terms of parsing time, with the possible exception of plain whitespace
-- but simply by virtue of being extra characters that need to be first
loaded and then examined: It doesn't matter exactly what byte sequence
marks a comment's end: The scanner needs to take at least a quick peek
at each and every character of the comment to determine whether we're
there yet.

The only way to truly speed up parsing of comments would be to specify a
comment's end not by marking it with a special character sequence, but
by specifying its length as part of the comment start sequence. This
would allow the scanner to jump straight to the end without looking at
the characters in between. However, besides being an /epic/ PITA to use,
this approach would also have additional drawbacks on top:

- The size would have to be specified in code units, not characters. Use
a non-ASCII character in the comment, and convert the file from some
8-bit legacy encoding like Windows-1252 to UTF-8, or from UTF-8 to
UTF-16, and you've just broken your scene without changing a single
character.

- The size specifier would have to be read and translated from text to a
numeric value, which is more difficult than taking a quick peek at the
characters, so for short comments (which would be the only ones
reasonably manageable with such a syntax) you would probably gain little
to nothing, and might even lose.

- The extra size specifier would in most cases probably lead to more
characters per comment, increasing the time required to load them from
disk into memory in the first place. Not sure how much that would
contribute though.


TL;DR: Barring pathologically unusable syntax, there is no way to
improve on comment syntax for better parsing performance.


Post a reply to this message

From: clipka
Subject: Re: Reading things differently.
Date: 19 May 2018 01:31:25
Message: <5affb6ad$1@news.povray.org>
Am 18.05.2018 um 22:28 schrieb Kenneth:

> For sheer parse-time efficiency, I'm wondering if the number of separate code
> lines in a scene also makes a difference. In other words, is this more
> efficient...
> 
> #declare S=32+64+17;
> 
> ..... than this:
> 
> #declare S=
> 32
> +
> 64
> +
> 17
> ;
> 
> Does the scanner have to 'scan' the ENTER key's entry, when starting new
> lines like this?

Absolutely. It even makes a difference whether you're using a Windows or
Unix machine.

I would guesstimate that the multi-line version may be about as
performant as

    #declare S=   32   +   64   +   17   ;

when using Windows-style line endings, or

    #declare S=  32  +  64  +  17  ;

when using Unix-style line endings.


In ASCII text files, the length of each line is identified by a special
character (or character sequence) inserted into the character stream at
the end of each line.

On Windows, that end-of-line marker is customarily the (non-printable)
character sequence CR+LF (hex 0D 0A, Carriage Return followed by Line
Feed). On Unix it is customarily just the LF character (hex 0A).

If it weren't for error reporting, line endings would parse exactly as
fast as an equivalent number of other whitespace characters (two blanks
on Windows or one blank on Unix).

However, line endings also require some different handling to keep track
of location information for errors or warnings (although the difference
is minor): While normal characters increment the column, a line ending
increments the line number instead, and additionally resets the column.
In this context, some extra work is also required to handle the
multitude of possible line ending sequences: If it was just for CR+LF
(Windows) vs. LF (Unix), the end-of-line code could be triggered just on
LF while treating CR as a generic whitespace character; however, there's
also plain CR (classic Mac OS) that needs to be handled, and even LF+CR
(though that one's rather obscure).

My guesstimate is hat this extra processing amounts to about the same
workload as parsing an extra whitespace character, hence the above
guesstimates.


Post a reply to this message

From: Stephen
Subject: Re: Reading things differently.
Date: 19 May 2018 03:35:10
Message: <5affd3ae@news.povray.org>
On 19/05/2018 05:31, clipka wrote:
> TL;DR: Barring pathologically unusable syntax, there is no way to
> improve on comment syntax for better parsing performance.

Your explanations are never TL;DR.
For me, sometimes TD;CU. But always worth reading.
(Too Difficult, Couldn't Understand. ;-) )

-- 

Regards
     Stephen


Post a reply to this message

From: Thomas de Groot
Subject: Re: Reading things differently.
Date: 19 May 2018 03:48:08
Message: <5affd6b8@news.povray.org>
On 19-5-2018 9:35, Stephen wrote:
> On 19/05/2018 05:31, clipka wrote:
>> TL;DR: Barring pathologically unusable syntax, there is no way to
>> improve on comment syntax for better parsing performance.
> 
> Your explanations are never TL;DR.
> For me, sometimes TD;CU. But always worth reading.
> (Too Difficult, Couldn't Understand. ;-) )
> 

IA
(I agree)

-- 
Thomas


Post a reply to this message

<<< Previous 4 Messages Goto Initial 10 Messages

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.