POV-Ray: Newsgroups: povray.pov4.discussion.general: Better #read functionality

POV-Ray : Newsgroups : povray.pov4.discussion.general : Better #read functionality		Server Time 15 Nov 2025 08:45:55 EST (-0500)

From: Bald Eagle
Subject: Better #read functionality
Date: 12 Nov 2025 15:40:00
Message: <web.6914effb97e0ce7f7f81dbac25979125@news.povray.org>

Just exploring the idea of allowing POV-Ray to read in text files without
quotes.
Would also need to expose the automatic typing so that the user could them
properly use the data assigned to the identifiers without triggering errors.

"What would be required for povray to be able to #read from files with unquoted
strings?"
Copilot said:



because the token stream classifies them as identifiers/keywords, not as string

tokenizer and/or the #read implementation so that it can capture the exact
lexeme of the next token (including identifiers) and coerce it into a string
when a string is expected.

Below is a practical, implementation-level outline of what would be required,
plus safe design options and workarounds you can use today.

What would be required in POV‑Ray to #read unquoted strings
1) Extend #read to accept non-STRING tokens for string destinations
Current behavior: #read expects the file content to match the type of the
destination variable(s). For strings, that means a STRING token (i.e., "foo").
Desired behavior: When the destination is a string variable, allow IDENTIFIER,
NUMBER, or even OTHER tokens to be converted to a string using their exact
source text (the lexeme).
Implementation sketch:

In the #read directive handler (in the parser), when the next destination is a
string:
Ask the lexer for the next token.
If it is STRING, use its value as today.
If it is IDENTIFIER, SYMBOL, NUMBER, etc., capture the raw lexeme (verbatim text
slice) and assign it to the string variable without requiring quotes.


This requires the lexer to expose the lexeme for a token (usually trivial if the

need to add that field.


can consume when the destination is a string; existing files with "quoted"
values will still work.

2) (Optional) Add delimiter-aware reading for multi-token unquoted strings
If you need to read phrases like SomeValue-42_v2 or even foo bar baz up to a

e.g.:

#read Delim "," MyStr → read everything up to the next comma into MyStr
(trim trailing spaces).
#read Line MyStr → read the remainder of the current line as a string.
#read Tokens N MyStr → concatenate the next N tokens into MyStr using a
space or no separator.
Implementation sketch:

Extend #read grammar to allow modifiers (Delim, Line, Tokens).
When a string destination is paired with one of these modifiers:
Delim: read raw characters from the file stream until delimiter; do not use the
token stream (bypass lexer) to preserve whitespace and punctuation.
Line: read raw characters until \n or EOF.
Tokens N: read the next N tokens via the lexer, join their lexemes with a chosen
separator (empty or single space), and assign.
This gives you robust control for CSV/TSV, fixed-width, or lightly-structured
text without quotes.

3) Keep keywords and identifiers intact (no macro expansion while reading)
When you accept unquoted strings, a bareword like sphere could be a keyword in

sphere.

Requirement:

In string read mode, suppress semantic interpretation of the token; just take
its lexeme. This is already natural if you act before any semantic checks on the
token and only depend on its class and lexeme.
4) Locale/Unicode considerations

that character set. If you need Unicode in unquoted strings, prefer:
Delim/Line (raw) reading modes to bypass tokenization, or

change).
Be explicit about newline handling (\r\n vs \n) and trimming policy.
5) Backward compatibility and error handling
No breaking changes if you only extend behavior when destination is a string.
If a user mistakenly uses #read into a float or vector and the file has an
unquoted word, continue to error as today (or optionally support coercion with a
clear error message).
Provide a strict mode flag if desired: e.g., #version gating, or #read Strict to
require quotes.
Minimal design changes (good balance)
Context-aware coercion in #read:

string value.
Optional raw modes:
#read Line VarStr
#read Delim "," VarStr
This gives you 95% of the utility with relatively small, local changes.

Developer-facing documentation updates
Update #read docs to state:

quotes or any single token like identifiers/numbers/symbols) will be captured as
the string value.
Add syntax for Line and Delim.
Specify whitespace trimming and delimiter consumption rules (does Delim consume
the delimiter? Typically yes).
Practical workarounds you can use today (no engine changes)


Preprocess input files externally to add quotes around the string fields.

E.g., a tiny script (Python/awk) to wrap the nth field in quotes.
Use #include instead of #read for symbolic content:

Structure your data as legal SDL:



Plain Text


// data.inc
#declare S_NAME = "Sphere";
#declare S_SHADE = "Matte";
#declare S_TAG = "my_tag-42";
Then #include "data.inc" and use the values as normal. (You avoid #read, but you
must still quote strings in the include.)
Encode strings as numbers and decode (usually not ideal):

If strings are from a small finite set, read an integer code with #read and map
it to strings via a macro table.
CSV with quotes only around textual fields:

Keep numbers unquoted, strings quoted; #read handles each in sequence.
Example of the desired extended behavior (spec)
File tokens.txt:

foo bar 123 sphere "already-quoted" , end
SDL (after proposed changes):




Plain Text


#declare S_A = "";
#declare S_B = "";
#declare S_C = "";
#declare S_D = "";
#declare S_E = "";

#fopen FH "tokens.txt" read

// Simple token-to-string coercion
#read (FH, S_A)        // S_A = "foo"
#read (FH, S_B)        // S_B = "bar"
#read (FH, S_C)        // S_C = "123"
#read (FH, S_D)        // S_D = "sphere"
#read (FH, S_E)        // S_E = "already-quoted"

// Delimiter mode (optional extension)
#declare S_F = "";
#read Delim "," (FH, S_F)  // Reads until comma: S_F = "" (empty because comma
was next), or " " if you choose to keep spaces
                           // Next token after this call starts after the comma

#declare S_G = "";
#read Line (FH, S_G)       // S_G = " end"
#fclose FH

// ***end of code***



Recommendation

change is:


is a string, without requiring quotes.
Optionally add Line and Delim modes for raw-text ingestion.
This keeps the rest of the language unchanged, avoids ambiguity, and unlocks
straightforward ingestion of many real-world text files.

Post a reply to this message

From: Bald Eagle
Subject: Re: Better #read functionality
Date: 12 Nov 2025 15:55:00
Message: <web.6914f34c2311edc27f81dbac25979125@news.povray.org>

"What would be required to expose the automatic typing from a #read directive,
so that users could implement conditional processing of files based on the data
type read into a variable?"


Copilot said:

In short: you need a way for SDL authors to observe the concrete runtime type of

exposing either (a) type tags on variables or (b) side‑channel metadata
from #read (or both). Below is a practical, engine‑level design you can
implement with minimal surface area, plus SDL examples showing how creators
would use it.

Goals
Let users branch conditionally after #read based on whether the data was a
number, vector (and how many components), string, etc.
Keep backward compatibility with existing scenes.
Avoid changing how #read parses files unless explicitly asked.
Minimal, low‑risk feature set (recommended)
1) Add a built‑in typeof() (or type()) function


to surface it to SDL.

Proposed return codes (example):

TYPE_FLOAT (aka scalar / DBL)
TYPE_VECTOR2, TYPE_VECTOR3, TYPE_VECTOR4, TYPE_VECTOR5
TYPE_COLOR (optional alias of TYPE_VECTOR5 if you want to distinguish colors)
TYPE_STRING
(Optionally later: TYPE_ARRAY, TYPE_DICTIONARY, TYPE_BOOL aliasing float, etc.)
You can implement typeof() to return an integer enum (most compact and fast for
#switch/#case), and optionally a typestr() that returns a descriptive string for
debugging.

Why this works
#read already assigns a value to the destination variable. With typeof(), user

#read needed.

2) (Optional but useful) Expose vector arity via veclen()
Add veclen(Identifier) that returns 2..5 for vectors (including colors), and 0
or 1 for scalars/strings. This removes ambiguity between VECTOR3 vs COLOR if you


3) (Very small change to #read) Publish last‑read metadata
Provide a reserved, read‑only system variable updated by #read:

__read_type → same code as typeof() would give
__read_components → 0 (string/scalar) or 2..5 (vector arity)
__read_ok → 1 if the last #read succeeded, 0 on EOF/type mismatch
__read_line / __read_col (optional) → position where the read began
This is useful when creators prefer to read into dedicated type‑specific
variables (e.g., a float) but still want to know what was actually encountered.

Back‑compat: These identifiers are reserved and only populated
post‑#read. Existing scenes are unaffected.

Alternative/Complementary APIs

Add a mode that chooses destination type from the input, then stores it:

#read Auto (FH, V_ANY)   // V_ANY adopts the type of the data just read
If next lexeme starts with <, parse vector.



IdentifierAsString is requested.
Then typeof(V_ANY) (or __read_type) lets the user branch.

Why this is optional: You can already achieve conditional processing using the



B) Peek before you read: peek_token(FH) / peek_kind(FH)


peek_kind(FH) returns a token class enum (TOK_STRING, TOK_NUMBER, TOK_LESS,

peek_lexeme(FH) (optional) returns the raw lexeme of the next token.
With this, users can decide how to read without risking consumption or errors.


How creators would use it (SDL patterns)
The examples assume typeof() returns integer codes you define and provide as


Pattern 1: Branch after a normal #read



Plain Text


#declare S_READ = "";   // destination can be reassigned to any type later
#fopen FH "data.txt" read

#read (FH, S_READ)

#if (typeof(S_READ) = TYPE_STRING)
    // process string
#elif (typeof(S_READ) = TYPE_FLOAT)
    // process scalar
#else
    #if (veclen(S_READ) = 3)
        // process <x,y,z> (or color-as-vec3)
    #elseif (veclen(S_READ) = 5)
        // process color/vec5
    #end
#end

#fclose FH
// ***end of code***

Show more lines
Pattern 2: Use read‑metadata (no need to call typeof())



Plain Text


#declare V_ANY = 0;
#read (FH, V_ANY)

#if (__read_type = TYPE_STRING)
    // ...
#elseif (__read_type = TYPE_FLOAT)
    // ...
#else
    #if (__read_components = 3)
        // ...
    #end
#end
// ***end of code***
Pattern 3: Peek, then choose a typed read (no auto‑typing required)



Plain Text


#if (peek_kind(FH) = TOK_STRING)
    #declare S_VAL = "";
    #read (FH, S_VAL)
#elseif (peek_kind(FH) = TOK_LESS)
    #declare V_VAL = < 0.0, 0.0, 0.0 >;
    #read (FH, V_VAL)
#else
    #declare S_NUM = 0.0;
    #read (FH, S_NUM)
#end
// ***end of code***
Engine changes required
Parser/runtime
Expose value type tags already present in the internal variant/union that
represents SDL values.
Implement typeof() (expression evaluator hook). Return an integer code; document
constants for SDL.
Implement veclen() to inspect vector arity from the value payload.
Modify #read directive handler to set the metadata variables (__read_type,

Tokenizer (optional, for peeking)
Add peek_kind() and peek_lexeme() APIs that do non‑consuming lookahead on

Ensure the lexer keeps token class and lexeme accessible (typically already
available for error reporting). If not, add a field to carry the exact text
slice.
SDL surface & docs
Define and document the type enum returned by typeof().
Document semantics:
typeof() works on any identifier bound to a value.
veclen() returns 2..5 for vectors, 0/1 otherwise.
__read_* variables are updated after each #read.
Keep all features gated by #version if you want opt‑in behavior.
Error handling & edge cases
EOF: #read should set __read_ok = 0; others unchanged or set to sentinel values.
Type mismatch (legacy reads): preserve existing error behavior. Metadata can
still be set to what was attempted or left unchanged; document this.
NaN/Inf: classify as TYPE_FLOAT.
Colors: decide whether to unify with VECTOR5 or distinguish as TYPE_COLOR. If
you distinguish, set __read_components = 5 in both cases and let typeof()
disambiguate.
Why this design is safe and powerful
Zero breakage: existing scenes keep working; the new functions/vars are
additive.
Immediate utility: even just typeof() unlocks post‑#read branching.
Granularity: if you add peeking later, scene authors get pre‑read control
without engine auto‑typing changes.
Nice‑to‑have future extensions


"string").

Post a reply to this message