POV-Ray: Newsgroups: povray.binaries.images: Reading things differently.

POV-Ray : Newsgroups : povray.binaries.images : Reading things differently.		Server Time 19 Apr 2024 19:13:47 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: clipka
Subject: Re: Reading things differently.
Date: 19 May 2018 00:31:29
Message: <5affa8a1$1@news.povray.org>

Am 18.05.2018 um 22:16 schrieb Bald Eagle:

> Well, something I would like added to SDL is at least one other way to tag
> comments, which would be displayed in a different hue.
> Because it would certainly be useful to be able to highlight some very important
> text in a long /*   ...   */ comment block

That would be a feature request for the editor component, which I refuse
to touch.

Use an external editor if this is of any importance to you.


> Along those lines, perhaps if there were an "expert mode" comment tag, then the
> scanner could just ignore the rest of everything on that line,

Such a thing already exists: It starts with `//`.

Really. "Just ignore the rest of everything on that line" is effectively
what the scanner does when encountering `//`-style single-line comments
- although technically it's "just scan and discard the next characters
until you encounter an end-of-line", but that's virtually as fast as it
gets.

The only thing faster would be a comment having just a single character
for its end tag (e.g. `/{ This is a new style of comment }`), because
the test for end of comment would literally be a test for one particular
character, while the end-of-line test must necessarily be a bit more
complicated due to the different line ending styles (LF, CR, CR+LF,
LF+CR). But that difference in performance would be minor, and probably
be offset by an increased complexity in testing for start-of-comment
sequences.

> or perhaps
> something like #comment(100) would treat the following hundred lines as
> comments....

Yikes. That would introduce a host of problems:

- Have fun debugging scenes making use of such a construct! (Forget
syntax highlighting; there are probably very few, if any, editors out
there that could support such a thing.)

- The scanner currently reports `#` and the subsequent directive as
separate tokens (note for instance that `#end` and `# end` are
equivalent); so the suggested syntax would require it to distinguish
between regular `#` and `#comment`, a feat which potentially involves
scanning ahead more than a single character (e.g. to distinguish it from
`#case`, or maybe even a future `#command`). A single-character
scan-ahead is fine, but anything beyond that would significantly
complicate the scanning algorithm.

OR the `#comment` construct would have to be handled by a later stage of
the parser, but that would be choosing cancer over polio:

- Processing the `#comment` statement would be far more "heavyweight"
than the other comment styles.

- With the scanner unaware of the peculiarities of `#` followed by
`comment`, the content of such a statement would have to be well-formed
in terms of the scanning rules; for instance, double quotes would have
to be balenced, as would have to be `/*` and `*/`.

- The scanner would have to report each end-of-line as a token (as
opposed to just treating it as whitespace and thus skipping it), so that
the downstream code could count the lines.

While all of these issues /could/ be worked around, the resulting code
would do nothing to improve parsing time: You'd still find the existing
comment styles to be faster, and the added complexity would further slow
down parsing in general.


In the current parser architecture, comments don't increase parsing
speed by virtue of being difficult to parse -- to the contrary, on a
per-character basis they are probably the most lightweight constructs in
terms of parsing time, with the possible exception of plain whitespace
-- but simply by virtue of being extra characters that need to be first
loaded and then examined: It doesn't matter exactly what byte sequence
marks a comment's end: The scanner needs to take at least a quick peek
at each and every character of the comment to determine whether we're
there yet.

The only way to truly speed up parsing of comments would be to specify a
comment's end not by marking it with a special character sequence, but
by specifying its length as part of the comment start sequence. This
would allow the scanner to jump straight to the end without looking at
the characters in between. However, besides being an /epic/ PITA to use,
this approach would also have additional drawbacks on top:

- The size would have to be specified in code units, not characters. Use
a non-ASCII character in the comment, and convert the file from some
8-bit legacy encoding like Windows-1252 to UTF-8, or from UTF-8 to
UTF-16, and you've just broken your scene without changing a single
character.

- The size specifier would have to be read and translated from text to a
numeric value, which is more difficult than taking a quick peek at the
characters, so for short comments (which would be the only ones
reasonably manageable with such a syntax) you would probably gain little
to nothing, and might even lose.

- The extra size specifier would in most cases probably lead to more
characters per comment, increasing the time required to load them from
disk into memory in the first place. Not sure how much that would
contribute though.


TL;DR: Barring pathologically unusable syntax, there is no way to
improve on comment syntax for better parsing performance.

Post a reply to this message

From: clipka
Subject: Re: Reading things differently.
Date: 19 May 2018 01:31:25
Message: <5affb6ad$1@news.povray.org>

Am 18.05.2018 um 22:28 schrieb Kenneth:

> For sheer parse-time efficiency, I'm wondering if the number of separate code
> lines in a scene also makes a difference. In other words, is this more
> efficient...
> 
> #declare S=32+64+17;
> 
> ..... than this:
> 
> #declare S=
> 32
> +
> 64
> +
> 17
> ;
> 
> Does the scanner have to 'scan' the ENTER key's entry, when starting new
> lines like this?

Absolutely. It even makes a difference whether you're using a Windows or
Unix machine.

I would guesstimate that the multi-line version may be about as
performant as

    #declare S=   32   +   64   +   17   ;

when using Windows-style line endings, or

    #declare S=  32  +  64  +  17  ;

when using Unix-style line endings.


In ASCII text files, the length of each line is identified by a special
character (or character sequence) inserted into the character stream at
the end of each line.

On Windows, that end-of-line marker is customarily the (non-printable)
character sequence CR+LF (hex 0D 0A, Carriage Return followed by Line
Feed). On Unix it is customarily just the LF character (hex 0A).

If it weren't for error reporting, line endings would parse exactly as
fast as an equivalent number of other whitespace characters (two blanks
on Windows or one blank on Unix).

However, line endings also require some different handling to keep track
of location information for errors or warnings (although the difference
is minor): While normal characters increment the column, a line ending
increments the line number instead, and additionally resets the column.
In this context, some extra work is also required to handle the
multitude of possible line ending sequences: If it was just for CR+LF
(Windows) vs. LF (Unix), the end-of-line code could be triggered just on
LF while treating CR as a generic whitespace character; however, there's
also plain CR (classic Mac OS) that needs to be handled, and even LF+CR
(though that one's rather obscure).

My guesstimate is hat this extra processing amounts to about the same
workload as parsing an extra whitespace character, hence the above
guesstimates.

Post a reply to this message

From: Stephen
Subject: Re: Reading things differently.
Date: 19 May 2018 03:35:10
Message: <5affd3ae@news.povray.org>

On 19/05/2018 05:31, clipka wrote:
> TL;DR: Barring pathologically unusable syntax, there is no way to
> improve on comment syntax for better parsing performance.

Your explanations are never TL;DR.
For me, sometimes TD;CU. But always worth reading.
(Too Difficult, Couldn't Understand. ;-) )

-- 

Regards
     Stephen

Post a reply to this message

From: Thomas de Groot
Subject: Re: Reading things differently.
Date: 19 May 2018 03:48:08
Message: <5affd6b8@news.povray.org>

On 19-5-2018 9:35, Stephen wrote:
> On 19/05/2018 05:31, clipka wrote:
>> TL;DR: Barring pathologically unusable syntax, there is no way to
>> improve on comment syntax for better parsing performance.
> 
> Your explanations are never TL;DR.
> For me, sometimes TD;CU. But always worth reading.
> (Too Difficult, Couldn't Understand. ;-) )
> 

IA
(I agree)

-- 
Thomas

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages