|
|
|
|
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 14 Feb 2007 12:18:57
Message: <45d34481@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> Switch-case blocks are not that much faster compared to dynamic binding.
They are not by code, but by design. As I outlined, you get an impenetrable
mess of dependent methods. That is slow. How slow? Well, it depends on the
object's details. Expect up to twice the time required for processing. Does
that translate to 20% or 50% or more slowdown? Implementing would tell, but
no need to implement something already known to be slower. As I said, I have
done it before. How fast was it to begin with - the VRML 97 I worked on was
faster than the equivalent generated by a standard C parser generator. For
some unfortunate VRML 97 object types abstraction on a method level would
cause needlessly complicated detached processing that ate most of the
parser's performance compared to the parser generator version.
> A well-implemented parser *abstracts* away these implementation details.
That is pure theory and completely inappropriate for maintainable code.
> I see no reason why adding a new token could not be done by simply
> adding the name of a new class in one .cpp file (besides implementing
> that new class, of course).
Well, from experience I know there are good reasons to make certain choices
for speed and maintainability. So I guess in the end it just comes down to
believing in my experience in building very fast and maintainable parsers,
including all the lessons learnt from problems caused by overdoing the
design abstraction. The best theory simply isn't always the best (in all
dimensions) implementation.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 14 Feb 2007 17:11:22
Message: <45d38909@news.povray.org>
|
|
|
| |
| |
|
|
Thorsten Froehlich <tho### [at] trfde> wrote:
> > A well-implemented parser *abstracts* away these implementation details.
> That is pure theory and completely inappropriate for maintainable code.
Are you basically saying that abstract code is unmaintainable?
Are you next going to say that modules make code hard to reuse? ;)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 14 Feb 2007 17:55:30
Message: <45d39362@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>>> A well-implemented parser *abstracts* away these implementation details.
>
>> That is pure theory and completely inappropriate for maintainable code.
>
> Are you basically saying that abstract code is unmaintainable?
In the context of a parser, yes. It is also why parser generators are
getting out of fashion for complex languages if said code has to be
maintained by large(r) groups of developers.
> Are you next going to say that modules make code hard to reuse? ;)
Hmm ;-)
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 05:55:45
Message: <45d43c30@news.povray.org>
|
|
|
| |
| |
|
|
Thorsten Froehlich <tho### [at] trfde> wrote:
> >>> A well-implemented parser *abstracts* away these implementation details.
> >
> >> That is pure theory and completely inappropriate for maintainable code.
> >
> > Are you basically saying that abstract code is unmaintainable?
> In the context of a parser, yes. It is also why parser generators are
> getting out of fashion for complex languages if said code has to be
> maintained by large(r) groups of developers.
I don't really understand why.
A typical program which has a complex input file format consists
basically of two separate parts: The parser, which reads the input
file and interprets what is in there and creates appropriate data
from it, and the rest of the program which handles that data.
There's a clear wall between these two things: The rest of the
program is only interested in the data. It's not interested how the
data was created. Usually (and in fact optimally) it's not even interested
in the fact that the data came from a file. The data generator is
*abstract* to the rest of the program. It's just a black box which
spits data out of it. It doesn't matter where that data is coming from.
(This kind of abstraction allows for this "black box" to generate data
from alternative sources which may be completely different from an
input file, or may be generated by different input file formats.)
Usually, although perhaps not as importantly, the same applies to
the other direction: The parser just generates the data and is not
interested in what that data is used for (even though there may be
rational exceptions to this).
As I said above, this kind of abstraction makes it easier to enhance
the parser. For example, support for a new type of input file format
could be added completely transparently. The rest of the code doesn't
even need to know this.
However, if the interface between the parser and the rest of the
program "leaks" implementation details, it immediately creates a
dependency on a specific implementation, making it a lot more difficult
to actually change that implementation to something else. For example
adding support for a new file format may be impossible if the rest of
the program makes assumptions about the input being in the current
format.
The parser itself may be divided into separate parts too: The parser's
main body may contain only a generic parser for files in a certain format
(for a quick&dirty example, let's say that the input file is in XML format):
It just parses any file in that structural format but doesn't specify (and
thus hardcode) what information that structure might contain.
If this is well-made, it allows for modules to be "plugged in" the
parser. These modules may eg. handle blocks of certain type (for example
XML blocks named in a certain way). The main parser doesn't know what
that block is for, but it knows that if a block with that name appears,
it should be given to the module which tells it that it can handle it.
Adding support for new blocks is easy: Just create a new module and
"plug it in" the parser. In practice this "plugging in" may be as simple
as adding the name of the module to some array or such.
This makes the parser flexible. Instead of having hard-coded values,
new blocks can be added "dynamically" (even if it happens at compile time).
The price to pay for this is some additional virtual function calls.
Given that a virtual function call is a double indirect function call
(ie. take a pointer, take another pointer at an offset from that, and
then call the function pointed by that) it's not all that slower than
a gigantic switch-case block (which might actually internally have a
very similar implementation).
You say that this kind of system would require a large amount of
chained virtual function calls. I don't see why this would be so.
I can't think of any reason why this would be so.
The parser can do some *preprocessing* using data specified by the
modules. For example, if each module says what is the name of the tag
it supports, the parser can build a state machine from all the names,
with a pointer to each corresponding module in the leaf nodes. This
shouldn't be any slower than having this kind of state machine hard-coded
in the parser.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 07:32:37
Message: <45d452e5@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
>> In the context of a parser, yes. It is also why parser generators are
>> getting out of fashion for complex languages if said code has to be
>> maintained by large(r) groups of developers.
>
> I don't really understand why.
I did not expect parser implementations to be so annoyingly resisting an
abstract implementation either, but I had to learn that they are. I agree it
is surprising.
> There's a clear wall between these two things: The rest of the
> program is only interested in the data. It's not interested how the
> data was created. Usually (and in fact optimally) it's not even interested
> in the fact that the data came from a file. The data generator is
> *abstract* to the rest of the program. It's just a black box which
> spits data out of it. It doesn't matter where that data is coming from.
And this is exactly what will doom any implementation. If you follow through
with this for anything but a trivial language with trivial (read: no) rules
applied to statements, you set yourself up for getting unmaintainable code
that will also be much slower assuming you at least want to still have a
chance of comprehending it as author.
> The price to pay for this is some additional virtual function calls.
It is not about speed, it is about a maintainable implementation. The cost
of abstraction is much higher as I previously outlined, as you get a maze of
method calls, which don't eat time for the call, but for the operations they
need to perform and the order in which they need to perform them. You just
add complexity at a point where you don't want it assuming you want the same
efficiency as a much simpler switch-based implementation.
Once you have sorted out how to implement rules, you end up having to
distribute the rule logic between several methods. Inevitably you have to
consider all possible conditions in every method, add checks, and even do
additional processing because you have no control over when you get to
process the data. In trivial cases this adds no overhead, but you will soon
find that many rules are not of the trivial kind. Instead, they depend on a
lot more context, which might even be a context only computed on the fly. So
you have to compute that every time.
For example, consider the POV-ray camera statement. There are distinct rules
on what effect every keyword has, and how these keywords interact.
Intermediate values, like distances need to be computed, and applied. If you
allow setting values via a strictly external interface, you soon face the
fact that you can never know if more data will be set. Yet, behavior of the
camera changes. So each time you set a value, you have to compute a valid
set of all camera parameters, even if the user will set them next. The set
method simply cannot know what will come next. However, the current parser
does know all this, and keeps states between applying rules.
Of course, for a camera there is little data and hence not much time is
lost. But just consider a mesh. Suddenly you are dealing with potentially
gigabytes of data. Any operation on it will be costly.
In short, you can get some abstraction, but there is a serious price to pay
for too much abstraction. in an academic product you may be willing to pay
it, together with the added cost of maintaining it by a tiny group of highly
qualified experts that know everything in detail (even though I have to
admit that even that is impossible for an abstract implementation, even for
the author himself/herself if there is just one). That is a price you do not
want to pay for code developed by more than one or two people. It will just
be a waste of time, for designers, programmers, debuggers, and users.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 07:42:44
Message: <45d45544$1@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> As I said above, this kind of abstraction makes it easier to enhance
> the parser. For example, support for a new type of input file format
> could be added completely transparently. The rest of the code doesn't
> even need to know this.
Abstraction on the syntactic analysis level says nothing about abstraction
on the lexical analysis level. As those two are distinct, passes, and hence
design choices of the parser do not influence the design of the lexical
analysis. I thought it was clear that when we talk about "parser" here, we
talk about it in the was it is used in POV-Ray, which means syntactic and
semantic analysis.
> Adding support for new blocks is easy: Just create a new module and
> "plug it in" the parser.
This already works with the current parser. Only token names are kept in a
central file, because they are needed external to the parser, i.e. to
support syntax coloring. This could be handled differently though, so it
does not influence the level of "abstraction".
> You say that this kind of system would require a large amount of
> chained virtual function calls. I don't see why this would be so.
> I can't think of any reason why this would be so.
I never said this at all. I said you need to divide each rule into methods,
which in and by itself is the problem. The overhead when they called is not
a problem I am talking about.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 09:10:48
Message: <45d469e8@news.povray.org>
|
|
|
| |
| |
|
|
Thorsten Froehlich <tho### [at] trfde> wrote:
> > There's a clear wall between these two things: The rest of the
> > program is only interested in the data. It's not interested how the
> > data was created. Usually (and in fact optimally) it's not even interested
> > in the fact that the data came from a file. The data generator is
> > *abstract* to the rest of the program. It's just a black box which
> > spits data out of it. It doesn't matter where that data is coming from.
> And this is exactly what will doom any implementation. If you follow through
> with this for anything but a trivial language with trivial (read: no) rules
> applied to statements, you set yourself up for getting unmaintainable code
> that will also be much slower assuming you at least want to still have a
> chance of comprehending it as author.
If you do otherwise you doom yourself to hardcode the input format in
the program, making it painfully difficult to change the format, add support
for alternative formats and create alternative ways of generating the data.
You still insist that abstracting the format from the data makes the
program incomprehensible, yet you don't give any reasoning for this.
I have worked in a medium-size project which used a relatively complex
file format, and let me assure you that it was a blessing that I didn't
have to even know what the format looked like in order to implement the
programs in that project. (Of course I knew the format because I worked
so closely with it, and I had to create test input files and such, but
from the programming point of view I didn't have to care because the
parser was not my job.)
Abstracting the input file parser, and additionally the command-line
parser, gave many benefits. One good example was when the project
managers wanted a "history" section in all the result files: This history
section contained information about every single step (in this case
every single program and its parameters) that data had been run through,
so it would be possible to repeat the whole process again if necessary.
Thanks to the abstraction level of both parsers we could implement this
history section completely transparently by only modifying the command-line
parser and the file format parser. Not one single line of the program
themselves had to be modified, yet we got a perfectly-working history
section in all result files. After the modification and recompiling of
these parser libraries it was enough to relink all the programs (over
a dozen of them) and by magic every single one of them started supporting
automatic reading and writing of this new history section (adding the
program's own data to the output).
This was, in my opinion, rather illustrative. All programs started
supporting and creating a new block of data in the file format without
even one single modification in their source code. Only by modifying
two parsers.
Another example: At some point another idea came up: It would be nice
if part of the data could be in a separate file and a data block could
simply say "read the data from this file". Also it should be possible
to tell any program to write the data of any given block to a given
file instead of the regular output file.
Again: No modifications whatsoever were necessary in any of the
programs in order to achieve this. Again, just adding this feature
to the two parsers was enough. Now, completely transparently, all the
programs suddenly started supporting reading and writing of these
separate "include" files.
The teamwork between the two parsers was just awesome. Most things
could be done between them behind the scenes, transparently, without the
programs themselves even knowing, and it worked like a charm.
On the other hand, each time we wanted to add a new feature common
to all the programs such as these, if we would had to modify all the
over a dozen programs, I wouldn't call that "maintainable".
> Of course, for a camera there is little data and hence not much time is
> lost. But just consider a mesh. Suddenly you are dealing with potentially
> gigabytes of data. Any operation on it will be costly.
We had hundreds of megabytes of data to read/write at times in our
project. Didn't seem to cause any problem.
(The reading and writing was pretty fast, in fact. It was possible to
hacker-optimize it thanks to (or regardless of) the abstraction layer.)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 10:04:45
Message: <45d4768d@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> If you do otherwise you doom yourself to hardcode the input format in
> the program, making it painfully difficult to change the format, add support
> for alternative formats and create alternative ways of generating the data.
There is no sense in changing the format for POV-Ray. As I pointed out, it
is more than static data. I am not denying that with completely static data,
i.e. reading in a database format, you have little loss with abstraction.
However, as you are well aware, POV-Ray is way more than a static format. It
is much closer to a programming language, and it is a well demonstrated fact
that neither parser generators nor excessive abstraction are a gain for a
programming language. To the contrary, they seriously hurt maintainability.
> Abstracting the input file parser, and additionally the command-line
> parser, gave many benefits.
I am not denying that there are potential benefits at first sight, what I am
saying is they are completely outweight by the drawbacks.
> yet you don't give any reasoning for this.
As I said, I have substantial experience with exactly what we are talking
about, parsers for SDLs. Naturally I cannot disclose what I did at my
employer, hence, as I also said before, you either take my word for it or
you don't. -- But please consider this: Given that POV-Ray 3.7 scales
extremely well (according to Intel's measurements) thanks to the work of
Chris Cason and me (I did the design, he profiled and tweaked it), I think I
have successfully demonstrated I know really well how to deliver
high-performance code. So in the context of POV-Ray, I would say just wait
and see.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 11:44:09
Message: <45d48dd9@news.povray.org>
|
|
|
| |
| |
|
|
Thorsten Froehlich <tho### [at] trfde> wrote:
> > Abstracting the input file parser, and additionally the command-line
> > parser, gave many benefits.
> I am not denying that there are potential benefits at first sight, what I am
> saying is they are completely outweight by the drawbacks.
That contradicts completely my experience. Our parser system was
flexible, enhanceable and fast.
A more unabstract hardcoded solution would have been rigid and very
laborious to enhance, most probably giving no measurable speed benefits.
> Chris Cason and me (I did the design, he profiled and tweaked it), I think I
> have successfully demonstrated I know really well how to deliver
> high-performance code.
You said elsewhere that the point was not the performance but the
manageability.
"I used this solution and it proved to be very fast" is not a proof
of anything relevant. It only proves that that specific solution is fast.
It doesn't prove that other more flexible solutions are not equally fast.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Darren New
Subject: Re: Announcement: Moray acquired by POV-Ray; to be released as OpenSource
Date: 15 Feb 2007 11:46:56
Message: <45d48e80$1@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> If you do otherwise you doom yourself to hardcode the input format in
> the program, making it painfully difficult to change the format, add support
> for alternative formats and create alternative ways of generating the data.
If the data structures resulting from the parse are very similar to the
input language, and you're parsing from a "meaningless" stream of bytes
into essentially a native representation of the parse tree, this can work.
If you're doing something like generating code as you parse, the
intermediate level is not as useful. I've worked with OO parsers and
structured editors that generate data structures for interpreters. When
you have 300 mutually-recursive methods each maybe one to three lines
long, it gets very difficult to figure out what's going on when there's
a problem.
Plus, the more generic the parser, the more difficult it is to generate
clear error messages. For example, if your "union" refers to an object
that isn't allowed to be inside a union, generating the appropriate
error message can get very messy. You *can* make it work, mind, but it
can be harder than if you're using a less-data-driven type of parser.
> On the other hand, each time we wanted to add a new feature common
> to all the programs such as these, if we would had to modify all the
> over a dozen programs, I wouldn't call that "maintainable".
Questions like this always depend on which direction you see a program
growing over time. It's what makes software architecturing fun.
> Adding support for new blocks is easy: Just create a new module and
> "plug it in" the parser. In practice this "plugging in" may be as simple
> as adding the name of the module to some array or such.
Or not even that, if you're using a language with reflection. ;-) It's
exactly the kind of extensibility that Warp is talking about that
frustrates me so much trying to do big programs in C or C++, with no
programmatic access to the code I wrote myself. Why should I have to
tell the compiler *and* the application what the name of the class I
just added is? :-)
--
Darren New / San Diego, CA, USA (PST)
New horror show, now only in Las Vegas:
HANNIBAL LUXOR!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|