POV-Ray : Newsgroups : povray.general : Parser Scanner Test Case Challenge : Re: Parser Scanner Test Case Challenge Server Time
28 Sep 2024 09:18:10 EDT (-0400)
  Re: Parser Scanner Test Case Challenge  
From: clipka
Date: 14 May 2018 14:00:56
Message: <5af9ced8$1@news.povray.org>
Am 14.05.2018 um 18:51 schrieb Bald Eagle:
> clipka <ano### [at] anonymousorg> wrote:
>> The scanner will
>> also be responsible for earmarking each "word" with the line and column
>> where it was found.
> 
> Well, this all sounds like a herculean task.

I think it's manageable -- as long as I don't go into over-engineering
mode (like attempting to optimize the implementation for performance
right from the start while at the same time trying to add support for
generic character encodings; I had to convince myself that we can
probably do without EBCDIC for now ;)).

It's not like I'll be re-writing the entire parser; I'll just trim away
some very basic functionality of the tokenizer, and adapt the remainder
of the tokenizer to use the new dedicated scanner class. I think I can
get a clean cut through it all.

With that cut made, it will presumably be a lot easier to address the
true purpose of the whole operation: Cutting the tokenizer right down
the middle where it hurts, to separate context-insensitive tokenization
(such as identifying `sphere` as SPHERE_TOKEN) from context-sensitive
tokenization (such as identifying `Foo` as VECTOR_ID_TOKEN representing
the value <1,2,3> because the scene happened to contain a `#declare
Foo=<1,2,3>;` earlier).


> Perhaps you could kick-start this by posting an illustrative example, perhaps
> with a past known issue that's been resolved.

Example:

    314.e-2

should be interpreted as numeric literal `314.e-2`, but at some time was
interpreted as numeric literal `314.`, identifier `e`, punctuation `-`,
numeric literal `2`.


> A related aside:
> ----------------------------------------------------
> 
> Since writing all of that benefits from incorporating certain features early on,
> I'm wondering if this would be the point to think about implementing a
> nested-level-counter.

Such a functionality would be far beyond the layers I'm currently
working on right now. The scanner has no idea what semantics `(` or `{`
have, let alone `#` followed by `for` (yup, from the current tokenizer's
perspective those are two separate "words", and I'll leave it at that).
It doesn't even know that `end` is not an identifier but a keyword.

Even the envisioned context-insensitive half of the tokenizer would have
no concept of nesting of anything - by virtue of being context-insensitive.

> Quite often with loops or include files, a closing bracket or something gets
> left out, and it's a huge nightmare to backtrack through it all.
> Perhaps if the level of the instruction were returned along with the line number
> and column, it would easier to see "where" in the code things went wrong.
> 
> // Level 0
> #for (X, 0 10)
>      // Level 1
>           #if (Something = true)
>                // Level 2
>                #debug"True!"
>           #end
> #end

I think just a bare number would be of little use. In cases simple
enough that you can figure out by yourself what "level 2" actually
means, chances are you don't need that information anyway.

Also, cases of missing closing brackets/braces/parentheses/`#end`
statements etc. are typically found far later than where they are
actually missing, and by then the parser is usually back on the lowest
level already.

Remember, the parser has a _very_ poor understanding where brackets and
alike stuff would really be required. _At best_ it can tell you in
hindsight that something was missing _somewhere_; at worst, it may
actually be so far off the page that it hard-crashes before it even gets
a fair chance of noticing the mismatch.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.