POV-Ray : Newsgroups : povray.pov4.discussion.general : Parser rewrite Server Time
30 Oct 2025 18:00:39 EDT (-0400)
  Parser rewrite (Message 1 to 1 of 1)  
From: Bald Eagle
Subject: Parser rewrite
Date: 28 Oct 2025 13:20:00
Message: <web.6900faee20e0e8051af05c025979125@news.povray.org>
Just for kicks, I'm wondering how much we can shove at an AI and have it do a
lot of the complicated digging into the extant source, and how much we can have
it write a lot of the basic framework of 4.0


"given the source code for all of pov-ray, can you give a summary of what would
be needed to rewrite the parser?  Is there a short list of top formats (like
json) that would be most suitable for updating the SDL language?"

Copilot said:
(Reasoned for 14s)


Exploring additional formats

POV‑Ray parser today, followed by a short list of modern interchange


1) What the current parser does (and why that matters)

pipeline:

Scanner → breaks raw text into lexemes and tracks encoding, line/column,
newlines, comments.
deepwiki
Raw tokenizer / tokenizer → turns lexemes into tokens (identifiers,
keywords, literals, operators).
deepwiki
+1
Parser → consumes tokens, implements the SDL grammar (objects, textures,
directives, expressions), populates scene structures, manages the symbol table
(identifiers, macros), and invokes a function VM for user functions.
deepwiki
These pieces live under source/parser/ (e.g., parser.cpp, parser_tokenizer.cpp,
scanner.cpp, reservedwords.cpp). The parser is large (~10k LOC in parser.cpp
alone) and historically evolved, with hand‑coded recursive‑descent
rules intertwined with scene construction and semantic actions.
github

SDL itself mixes declarative scene data with imperative control (e.g.,
#if/#while/#macro/#read/#fopen), which is powerful but complicates formal
grammar extraction and pure data interchange.
povray
+1

2) A realistic plan to rewrite the parser
2.1. Extract and formalize the language
Write a formal grammar (EBNF or a PEG) for the current SDL, separating:

core scene constructs (camera, objects, textures),
expressions (numeric, vector, color), and
directives (#macro, #include, #if, #while, file I/O) that control
parse‑time evaluation.
This is not just academic; it makes it possible to drive a generator like


right starting point to catalog tokens and block forms.
povray
+1
Catalog reserved words & tokens from reservedwords.cpp, parser_tokenizer.cpp,


github
+1

2.2. Define clean internal representations


An AST for syntax (nodes for statements, object blocks, expressions).
A lowered scene IR (geometry, materials, transforms) created after macro
expansion and conditional evaluation.
The current parser interleaves semantic actions; refactoring to an AST enables
better error messages, tooling, and downstream transforms. (The docs describe
parsing → in‑memory scene; formalizing an intermediate stage is the
engineering step you add.)
deepwiki
Symbol table & evaluation model: SDL has macros and a function VM (user


resolves identifiers, macro calls, arrays, functions,
controls directive flow (#if/#while/#read), and
emits deterministic side effects (file I/O guarded by policies).
The DeepWiki overview calls out the symbol table and function VM explicitly; a
clean evaluator boundary will keep the parser deterministic.
deepwiki
2.3. Build a modern front‑end
Choose an approach:

Regenerate using a parser generator (ANTLR, PEG) for maintainability, or
Hand‑written RD parser with tables for precedence/associativity (closest
to today; easiest to keep performance).
Either way, keep scanner/tokenizer modular (encoding, line/column tracking and
tokenizer error recovery are already separated in scanner.cpp /
parser_tokenizer.cpp).
deepwiki
+1
Error reporting: implement a diagnostic layer with:

multi‑line snippets and carets,
incremental recovery (skip to matching } / next directive),
versioned language switches (SDL has #version and command‑line Version=).
povray
+1
Compatibility shim: SDL is used in decades of content. Keep a legacy mode driven
by:

#version in scenes and Version=/+MV in INI/CLI,
behavior flags for older defaults (e.g., color/ambient defaults changed across
versions).
povray
2.4. Tooling & tests

community scenes. Run golden‑image tests. The public docs and repo provide
the initial material.
povray
+1

Source mapping: carry filename/line/column for all AST nodes. Essential for IDEs
and better errors (and for a formatter).

Performance: measure with large include stacks; SDL allows many includes and
file I/O during parse. Keep I/O safe (opt‑in) and cache small includes
(similar to current library path search).
povray
+1


There are two distinct goals people conflate:

(A) Interchange format: a neutral way to feed scene data (no control flow) to
POV‑Ray.
(B) Full scripting language: everything SDL does today (macros, conditionals,
file I/O) but with a modern surface syntax.
A and B lead to very different choices.

3.1. For interchange (data‑only) scenes



JSON (or JSON5 / JSONC)

Pros: ubiquitous tooling; fast parsers; schemas (JSON Schema) to validate; easy
to generate from other apps.
Cons: no comments (plain JSON); expressions become strings; verbose for numeric
arrays unless compressed.
Fit: great as an IR serialization once expressions are resolved; pass
vectors/colors as arrays; include path lists as arrays.
Why suitable here: POV‑Ray already distinguishes parse‑time
evaluation from scene structures; JSON matches the post‑expansion IR.
deepwiki
YAML

Pros: comments; anchors/aliases for reuse; human‑friendlier than JSON for
big materials.
Cons: ambiguous edge cases; slower; indentation pitfalls; not ideal for large
numeric meshes.
Fit: comfortable authoring format if you want hand‑edited data; still
data‑only (no logic).
(You would still need an evaluator for expressions if you allowed them as
strings.)
TOML

Pros: simple, ini‑like; great for configs (global settings, render
options).
Cons: not geared toward deep nested geometry/material graphs.
Fit: super for render settings and library paths, less so for full scenes.
POV‑Ray already exposes many INI options for parsing and includes; TOML is
a natural successor for such settings.
povray
+1
glTF 2.0 (JSON + binary buffers)

Pros: industry standard for geometry/materials; supports PBR; efficient.

cameras/lights differ; no SDL logic.
Fit: as an import path for meshes/materials; you still need a mapping to POV
textures/finishes.
XML

Pros: schemas (XSD), mature; good streaming.
Cons: verbose; falling out of favor for hand‑authoring; not better than
JSON/YAML here.
Fit: only if you need strict schemas and existing XML infra.

schema (JSON) that matches the scene after macros/conditionals resolve. Use YAML
for authoring if desired; convert to JSON on ingest. This keeps the parser
rewrite focused on SDL → IR, while also enabling new non‑SDL inputs.
deepwiki

3.2. For a full scripting language (SDL replacement)

cut it; you need control flow, functions, modules, etc. Viable directions:

Keep SDL but tighten the grammar, add an official spec, and improve

JSON‑IR input alongside.) The existing directive set is well documented
and widely used.
povray
+1
Embed a small, sandboxed language (e.g., Lua, WASM scriptlets) strictly for
compile‑time evaluation of parameters, not for geometry itself. The

formalize it.
deepwiki
Define a new PEG‑specified SDL‑2 with modules, namespacing, and
explicit expression evaluation semantics, plus a migration tool from legacy SDL.
(The GitHub repo, DeepWiki, and documentation provide the raw materials to write
a precise spec.)
github
+2

Language spec: EBNF/PEG, directive semantics, version gates. (Docs + reserved
words + source audit).
povray
+2
Tokenizer with thorough tests (numbers, vectors, strings, comments, encodings).
(Start from scanner.cpp / parser_tokenizer.cpp).
deepwiki
+1
AST for statements/expressions; evaluator for macros/conditions; IR scene graph.

deepwiki
Error/diagnostics with source mapping, recovery, and #version handling. (CLI/INI
version switches are already part of parsing options.)
povray
Compatibility mode + test corpus (docs examples + repo tests + community
scenes).
povray
+1
Optional: JSON‑IR loader and exporter; YAML authoring; glTF mesh/material
import.

If your goal is to update the SDL ecosystem with a clean data path, this is the
most practical mix:


unambiguous, schema‑friendly; ideal for tools, converters, CI.
deepwiki

human‑friendly with comments/anchors; accept only a curated subset to
avoid YAML gotchas.
TOML for render/parse configuration (what INI does today): library paths,


povray
+1
glTF as an import format for geometry/materials where PBR mapping is acceptable
(not a replacement for SDL materials).

themselves; they are excellent to complement SDL with a modern, toolable data


6) Suggested migration strategy

UX without semantics changes. (Drop‑in for
parser_tokenizer.cpp/scanner.cpp responsibilities.)
deepwiki
+1
Phase 2: AST + evaluator; keep emitting the current scene structures.
Phase 3: Add JSON‑IR input path. Start documenting the IR as the official
interface.
Phase 4: Consider SDL‑2 (PEG‑specified) or a sandboxed VM for
compile‑time logic; provide an auto‑formatter and linter.
Closing thought
Because SDL mixes data with control, the most maintainable path is to separate
concerns: a modern parser/evaluator for SDL (compatibility first), and a
data‑only IR (JSON) that tools can generate/consume. That gives you a
clear, testable core and lowers the cost of future surface‑syntax

code and documentation already delineate the pieces you need to formalize and
redesign.
github
+3


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.