POV-Ray: Newsgroups: povray.pov4.discussion.general: Better #read functionality: Better #read functionality

POV-Ray : Newsgroups : povray.pov4.discussion.general : Better #read functionality : Better #read functionality		Server Time 15 Nov 2025 06:30:16 EST (-0500)
From: Bald Eagle
Date: 12 Nov 2025 15:40:00
Message: <web.6914effb97e0ce7f7f81dbac25979125@news.povray.org>
Just exploring the idea of allowing POV-Ray to read in text files without
quotes.
Would also need to expose the automatic typing so that the user could them
properly use the data assigned to the identifiers without triggering errors.

"What would be required for povray to be able to #read from files with unquoted
strings?"
Copilot said:



because the token stream classifies them as identifiers/keywords, not as string

tokenizer and/or the #read implementation so that it can capture the exact
lexeme of the next token (including identifiers) and coerce it into a string
when a string is expected.

Below is a practical, implementation-level outline of what would be required,
plus safe design options and workarounds you can use today.

What would be required in POV‑Ray to #read unquoted strings
1) Extend #read to accept non-STRING tokens for string destinations
Current behavior: #read expects the file content to match the type of the
destination variable(s). For strings, that means a STRING token (i.e., "foo").
Desired behavior: When the destination is a string variable, allow IDENTIFIER,
NUMBER, or even OTHER tokens to be converted to a string using their exact
source text (the lexeme).
Implementation sketch:

In the #read directive handler (in the parser), when the next destination is a
string:
Ask the lexer for the next token.
If it is STRING, use its value as today.
If it is IDENTIFIER, SYMBOL, NUMBER, etc., capture the raw lexeme (verbatim text
slice) and assign it to the string variable without requiring quotes.


This requires the lexer to expose the lexeme for a token (usually trivial if the

need to add that field.


can consume when the destination is a string; existing files with "quoted"
values will still work.

2) (Optional) Add delimiter-aware reading for multi-token unquoted strings
If you need to read phrases like SomeValue-42_v2 or even foo bar baz up to a

e.g.:

#read Delim "," MyStr → read everything up to the next comma into MyStr
(trim trailing spaces).
#read Line MyStr → read the remainder of the current line as a string.
#read Tokens N MyStr → concatenate the next N tokens into MyStr using a
space or no separator.
Implementation sketch:

Extend #read grammar to allow modifiers (Delim, Line, Tokens).
When a string destination is paired with one of these modifiers:
Delim: read raw characters from the file stream until delimiter; do not use the
token stream (bypass lexer) to preserve whitespace and punctuation.
Line: read raw characters until \n or EOF.
Tokens N: read the next N tokens via the lexer, join their lexemes with a chosen
separator (empty or single space), and assign.
This gives you robust control for CSV/TSV, fixed-width, or lightly-structured
text without quotes.

3) Keep keywords and identifiers intact (no macro expansion while reading)
When you accept unquoted strings, a bareword like sphere could be a keyword in

sphere.

Requirement:

In string read mode, suppress semantic interpretation of the token; just take
its lexeme. This is already natural if you act before any semantic checks on the
token and only depend on its class and lexeme.
4) Locale/Unicode considerations

that character set. If you need Unicode in unquoted strings, prefer:
Delim/Line (raw) reading modes to bypass tokenization, or

change).
Be explicit about newline handling (\r\n vs \n) and trimming policy.
5) Backward compatibility and error handling
No breaking changes if you only extend behavior when destination is a string.
If a user mistakenly uses #read into a float or vector and the file has an
unquoted word, continue to error as today (or optionally support coercion with a
clear error message).
Provide a strict mode flag if desired: e.g., #version gating, or #read Strict to
require quotes.
Minimal design changes (good balance)
Context-aware coercion in #read:

string value.
Optional raw modes:
#read Line VarStr
#read Delim "," VarStr
This gives you 95% of the utility with relatively small, local changes.

Developer-facing documentation updates
Update #read docs to state:

quotes or any single token like identifiers/numbers/symbols) will be captured as
the string value.
Add syntax for Line and Delim.
Specify whitespace trimming and delimiter consumption rules (does Delim consume
the delimiter? Typically yes).
Practical workarounds you can use today (no engine changes)


Preprocess input files externally to add quotes around the string fields.

E.g., a tiny script (Python/awk) to wrap the nth field in quotes.
Use #include instead of #read for symbolic content:

Structure your data as legal SDL:



Plain Text


// data.inc
#declare S_NAME = "Sphere";
#declare S_SHADE = "Matte";
#declare S_TAG = "my_tag-42";
Then #include "data.inc" and use the values as normal. (You avoid #read, but you
must still quote strings in the include.)
Encode strings as numbers and decode (usually not ideal):

If strings are from a small finite set, read an integer code with #read and map
it to strings via a macro table.
CSV with quotes only around textual fields:

Keep numbers unquoted, strings quoted; #read handles each in sequence.
Example of the desired extended behavior (spec)
File tokens.txt:

foo bar 123 sphere "already-quoted" , end
SDL (after proposed changes):




Plain Text


#declare S_A = "";
#declare S_B = "";
#declare S_C = "";
#declare S_D = "";
#declare S_E = "";

#fopen FH "tokens.txt" read

// Simple token-to-string coercion
#read (FH, S_A)        // S_A = "foo"
#read (FH, S_B)        // S_B = "bar"
#read (FH, S_C)        // S_C = "123"
#read (FH, S_D)        // S_D = "sphere"
#read (FH, S_E)        // S_E = "already-quoted"

// Delimiter mode (optional extension)
#declare S_F = "";
#read Delim "," (FH, S_F)  // Reads until comma: S_F = "" (empty because comma
was next), or " " if you choose to keep spaces
                           // Next token after this call starts after the comma

#declare S_G = "";
#read Line (FH, S_G)       // S_G = " end"
#fclose FH

// ***end of code***



Recommendation

change is:


is a string, without requiring quotes.
Optionally add Line and Delim modes for raw-text ingestion.
This keeps the rest of the language unchanged, avoids ambiguity, and unlocks
straightforward ingestion of many real-world text files.
Post a reply to this message