POV-Ray : Newsgroups : povray.programming : Why not generate parser with Bison & Flex? : Re: Why not generate parser with Bison & Flex? Server Time
29 Jul 2024 02:25:09 EDT (-0400)
  Re: Why not generate parser with Bison & Flex?  
From: Thorsten Froehlich
Date: 4 Jan 1999 16:55:55
Message: <369138eb.0@news.povray.org>
Sorry for taking so long to reply...

In article <368c0618.27900056@news.povray.org> , EPT### [at] aolcom (Evan Powers)
wrote:
>Sorry, you are mistaken. If you read up on the Bison grammar
>description language (http://www.gnu.org/manual/bison/index.html) you
>will realize that it is designed with a dynamic language definition in
>mind. Adding new language features and changing existing ones is
>simplistic.
>
>It is important to realize that, as defined by Bison, only the code
>within the POV parser specifying what language element should come next
>is part of the parser; the rest of the code, the part that builds the
>internal data structures, is *not*. If the POV parser were written
>with Bison, this code would be a Bison action (such as "{$$ = $1 +
>$3}" above) instead of being embedded within the parsing logic. Thus,
>a Bison POV parser would be just as stand-alone as PARSE.C.

Well, I have not worked with it, so I can only state on what I see. Your link was
very useful (I did not know that this pages conatins docs for most Unix tools...)
and after doing some reading, I think you are mostly right.
I am now very concerned about debugging the resulting code and platform support. I
did some experiments with an (not longer supported/not updated for a long time) old
plug in for my Mac compiler - first I had to seach for it for about half an hour
(most links to it were oudated...) and then remembered that I had some backup when I
downloaded it some time ago. As a matter of fact, I am currently not able to find any
Bison (which is maintained) for the Mac at all. Of course I found a few dozend Win
and Unix implementation links during my search: Currently the POV-Ray source code
compiles and allows changed with every (supported) C compiler, nobody needs to search
for hard to find 3rd party (also Flex and Bison are free) tools for its platform or
get the source code and compile it himself.
However, I managed to get an example file that came with it to work and generate a
compiling parser. After that I started to try do make a minor change and a typo (by
me) in some C interface code made me look into the parser file because Bison never
complained...
I realze that this is a beginners mistake, but I have been a C programmer for about 8
years now and never had to use Flex or Bison, so I assume many people who want to
make some changes to the public POV-Ray source code don't know Bison or Flex at all
(if programming is just a hobby for them), and having to *learn* the Flex and Bison
syntax before being able to extend POV-Ray does not make it easier for them.

>Note that this grammar file fully defines the operator precedence and
>associatively.

I assumed it would :-)

>I concede that this is not the best example; however, it tells enough
>about the Bison grammar language to allow the extrapolation of how
>modifications to the POV grammar could be made.

Yes, and I got it.

>I grabbed the following code snippet out of PARSE.C from the POV
>source archive. I have marked the lines that are actually part of the
>parser with a '%'.
>% CASE (SCALE_TOKEN)
>%  Parse_Scale_Vector (Local_Vector);
>  Compute_Scaling_Transform(&Local_Trans, Local_Vector);
>  for (Current=First; Current!=NULL;
>Current=Current->Sibling)
>  {
>   Scale_Object (Current, Local_Vector,
>&Local_Trans);
>  }
>% END_CASE
>
>Just as making a grammar file does not write the code building POV's
>internal data structures, writing the following using the existing
>parser macros doesn't either:
> CASE (SCALE_TOKEN)
>  Parse_Scale_Vector (Local_Vector);
> END_CASE
>
>Bearing my above statements in mind, I believe the argument is about
>which grammar description language is superior. I, for one, believe a
>language separating the grammar definition and the interpretation code
>is superior to one in which the two are intermixed. Furthermore, I
>believe that defining a language according to its overall structure,
>then the overall structure of its components, then the overall
>structure of the components' components, and so on, makes more
>intuitive sense and is easier to read and maintain than a definition
>of a language based upon what can follow each particular fundamental
>element.

Yes, you are right that the parsing part does not build anything, but please pay
attention to this little detail:
Look at the lines you did not mark, what do they do?  They reduce the memory usage!
There can possibly be thousands of "scale" *commands* (they force POV-Ray to scale
the object, so they are a sort of command). If the parser code would just be composed
of the lines you marked, you would have to store and execute the "scale" commands
later (after parsing) which creates the need for higher memory usage - and now assume
200000 objects (common for some users) with only one scale command each, and a scale
command requires a minimum of 24 bytes to store (if you use 8 byte floating point as
POV-Ray usually does): You just created the need for additional 4.8 MB of (temporary)
memory! What happens if e.g. an automatically generated scene (by a 3rd party
program) generates 200000 objects and five scale commands per object...? The only way
out of this would be to integrate the lines you did not mark into the parser code
again, and one of the advantages of the Bison parser would be lost again!
All these lines of code would have to be written inside the grammar file, wouldn't
they? And this would make the grammar file a total nightmare!!!  Or how would you
solve this problem, maybe I am totally wrong!?!


>Which of these makes more sense? (Which is more concise?)
>A) Bison-style.
> sentence:
>  subject action predicate
> subject:
>  [noun-modifiers] noun
> action:
>  [verb-modifiers] verb
> predicate:
>  [direct-object]
>  | indirect-object direct-object
>  | direct-object [indirect-object]

Yes, the Bison approach is superior in hidding the complexity created by the io and
"command" code, but you need temporary storage of the data and then modify it, store
it somewhere else (differnt data structure) and the free the temporary memory.

>B) Current POV parser style.
> sentence:
>  [noun-modifiers] noun-modifiers-followers
> noun-modifiers-followers:
>  noun noun-followers
> noun-followers:
>  [verb-modifiers] verb-modifiers-followers
> verb-modifiers-followers:
>  verb verb-followers
> verb-followers:
>  [direct-object]
>  | indirect-object io-followers
>  | direct-object do-followers
> io-followers:
>  direct-object
> do-followers:
>  [indirect-object]

Yes, the POV-Ray scene language contains not only data structures but also commands
(I (still) can't find a better word, it is what you called "do") how to modify this
data. This is a major difference to the common programming languages which allow easy
parsing from the top to the bottom, while POV-Ray is self modifying code of some kind
(or some kind of hidden preprocessor): If you parse C you can parse tokens like a
stream, nothing (OK, this is oversimplified) you parse later in the stream of tokens
will require you to *change* data you created earlier.
Take an expression tree generation parser as example (assuming we don't do the
parsing and code generation on the fly): You build the tree, node by node through
recursion which is no problem, you get code by just stepping through the tree later.
You write the expression code by parsing the tree in a different way (so you can get
reasonable assembler code perhaps).
Now look at this expression tree as some kind of more complex expression, a typical
POV-Ray scene object like a sphere: You parse the sphere and now you would usually
generate the output or store it to generate the output later. For this the Bison
parser would work *very* well and surely be faster than the POV-Ray parser, both just
generate the sphere object data structures and fill in the data. And now, there is
e.g. a scale command found!

If you compare the Bison and the POV-Ray parser code you will see the following:

* In Bison parser code (or better, in the grammar file) you intended to keep the data
modifying code out! Now you have to *store* the commands (or actions) that change the
data. You will then go through all this data later, lookup each object, check if
there are scale commands for it and execute them. Then you free the commands data
structures.
* The current POV-Ray parser never had to store these commands and never used this
additional memory. It applied the scale command "on the fly".

And here starts the next problem to surface: Error and warning handling.
If a user made a mistake and e.g. wrote "scale <0,0,0>" which makes no sense, the
POV-Ray parser will find this problem in place and have no problems to report it to
the user. This is especially useful because POV-Ray scene objects have no names like
functions in programming languages, there is no abstract location name, only a line
number which identifies the scene object!
Of course you will/should do this basic error checking in the parser code, but you
cannot find them all, assume some crazy user (or 3rd party program) would apply ten
"scale <0.00001,0.00001,0.00001>" commands to an object by accident. Each of these
scale commands is valid by itself, but doing them ten times you may get an underflow
and the scale would again be <0,0,0>!  If this happens in the first of 200000 objects
the POV-Ray parser will report this to the user immediatelly, but the Bison parser
cannot even know that something is wrong here!  In order to report the problem later
you would have to store the file and line each of these scale commands - even more
memory would be needed!

>You don't extend the Bison generated code; you extend the original
>grammar definition file. (I will grant you that one would not want to
>manually generate a parser from a Bison grammar definition file. But
>then, that's what Bison is for!)

You did not get my whole point: Even a small change to the grammar and the whole
parser has to be recompiled, I don't want to build a parser from the grammar file,
just extend the parser code by hand.

>GNU Bison generates the same code regardless of the host system type,
>and is the de facto standard for parser generators. For the sake of
>argument, however, I will assume that POV developers use several
>versions of Bison.

If some older Bison has a bug, it may not be a problem on one platform, but another
platform would have it...this is a debugging and bug report nightmare - then even the
core code would be *different* (even if its just a bit) for each platform. And don't
forget compiler bugs, etc! In an ideal situation, of course, this would not be a
problem, but unfortunately this isn't the case :-(

>With this assumption, your statement is true--but then, it doesn't
>really matter. If someone wanted to extend the language, they would
>modify the grammar definition file, *not* the generated parser. In
>fact, I see no conceivable reason for anyone to even *look* at the
>generated parser. (Not even someone debugging a Bison parser looks at
>the generated parser code.) The file distributed in the POV source
>archive would be the Bison grammar file, not the generated parser
>file.

This would require someone in the team to always generate one parser for everybody in
the team, the only way to make sure the bugs are out everywhere.  If this would not
be done, each platform developers would also have to make sure that on their platform
the *whole* parser works as expected - and at least on the Mac we have other things
to do... :-)

>As for the fact that different versions generate different code:
>All Bison-style parser generators use the same fundamental algorithm.

Hmm, this isn't a good argument, ray tracers all use the "same fundamental algorithm"
as well - and the differences are quite *visible*.  However, it would be unfair  if I
would say this is the same case for Bison, the differences are surely less important,
but *bugs* are not.  And even the POV-Ray C source code has to work around several
compiler bugs.  And fixing them in the Bison source code is no reasonable solution.

>Even Yacc, Bison's ancestor, uses the same algorithm. The only
>significant differences in the generated code are those caused by
>whether the generated code is optimized for speed or size. I contend
>that these differences are inconsequential, since no one actually
>looks at the generated code anyway and since different command line
>options can generate comparable code with different Bison versions.

Yes, this is true. 

(Command line options are another problem: E.g. Macs don't have a command line (but
of course I can work around this) and getting the same options would still be
difficult on different platforms.)

>Furthermore, the POV development team is unconcerned about the fact
>that Borland, Watcom, Microsoft, and GNU compilers generate different
>code from the same platform independent source file, and that, for
>example, a LIBPNG compiled by Borland C cannot be used with a version
>of POV compiled by GNU C. (I contend that there is no fundamental
>difference between these two situations, and therefore are
>comparable.)

Yes, I totally agree that the resulting compiled code is no concern. (Just a small
note: On the Mac libraries and PPC code is exchangeable to some degree, and the same
would be possible everywhere else, but POV-Ray does of course not depend on it.)

>This is not true. A given Bison grammar file is platform independent,
>since Bison generates platform independent ANSI C, provided the
>user-supplied C code within its actions is platform independent, just
>as a parser using the POV macros is platform independent provided the
>code generating the internal data structures is platform independent.

You missed my point: Not the Bison grammar file is the problem, I referred to the
platform specific Bisons. However, I understand what you wnated to say.

>Knowing what I do about the Bison algorithm, I very much doubt that a
>handwritten parser written in the style of PARSE.C could in any way
>compare in terms of speed. Read the chapters in the Bison manual about
>the Bison algorithm to see what I mean. (The URL is above.)

Well, it is no good idea to make any assumtions about speed without any experiments
with the actual POV-Ray grammar, I think now. However, please consider memory usage
and the time it requires later as well.   And I did not say that the handwritten
parser has to be in PARSE.C style.

>(Incidentally, I believe the C and C++ language standards are just as
>"open" as the POV-Ray scene language. $18 per electronic copy of the
>C++ language standard--price from www.nssn.org--doesn't sound like the
>price of a closed standard to me. Besides, that explains why AT&T
>isn't the only company who can legally sell a C or C++ compiler.)

Also this is an off-topic, I have to inform you that you are wrong:
Any reason why a standrad document has to be expensive? C++ is a standard, are you
sure you refer to the ISO 14882 standard document?  I am very convinced it is closed,
and the compiler companies (expect MS of course) work ard to implement this standard.
And about the AT&T issue, I quote Stroustrup, The C++ Programmin Language (3rd Ed.),
page 11: "AT&T Bell Laboratories made amajor contribution ... by allowing ... to
share drafts ... and the base document for the ANSI C++ standardization efford. ...
In June 1991 ... C++ became part of an ISO (international) standardization efford ...
A formally approved international C++ standard is expected in 1998" (And it *was*
accepted in 1998!!!!!)

>Haven't you heard of templates, exception handling, and RTTI (Run Time
>Type Identification)? All are *very* recent additions to the C++
>language standard.

No, they are not all recent addtions to C++, you will find the roots of RTTI and
templates in the now *ten* year old Annotated C++ Refernce Manual, and all features
are (nearly) complete since 1995, and three years are a long time in computer
history!   To your "defense" I have to add that most compiler developers are very
slow implementing all these features.

>Granted, the initial conversion of PARSE.C into a Bison grammar file
>will be considerable work. However, it only has to be done once, and
>is more easily modified to include additional language elements than
>the current macro-based system.

Yes, macros are no recommended programming practice today, but the modern
replacements in form of templates are far superior and currently there is no Bison
implementation that can use the C++ standard library (which includes the STL
(Standard Templaet Library)) features. And the STL offers a lot of abstraction, e.g.
the hash table templates or, of course, the very useful containers, no macro can
offer that.
Bison does not fit to well into this and I cannot find a switch for the Bison I have
used which allows C++ code output, but of course this can change in the future! And
until then the C code is still compatible with C++.

>I fail to understand how the work involved in modifying a Bison
>grammar description file is greater than that involved in modifying
>PARSE.C.

You need to parse the grammar and only then you can parse the C code, and most modern
IDEs need plug-in Bisons to to this automatically - there is no old fashioned (but
very flexible) makefile!

>Besides, your position is like arguing that AT&T should never have
>abandoned Lparse in favor of a direct C++ to assembler implementation
>because Lparse was perfectly usable, even though a direct
>implementation had significant advantages. (Yes, C++ was originally
>implemented with a C preprocessor.)

I know, but I don't see why my argument is comparable to the C++ by preprocessor vs.
direct C++ issue which has been resolved over ten years ago as well, as far as I
know. 


Conclusion: I think Bison is a very interesting, well implemented and useful tool,
but my position is that it does not fit the POV-Ray needs as well as a handwritten
parser implementation could (not limited to the current one). Your position is
different and thats OK for me, so if you find the time to write a POV-Ray grammar and
implement a sample parser (you don't need to support all objects) and can show me
that
I am wrong, then this would be great!


     Thorsten

PS: Some of my points are Macintosh related simply because I program on the Mac most
of the time and it is the system I know best.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.