| 
|  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | How come POV is so slow when it comes to parsing a complex script?
Specifically, loops seem to take a very long time to run (compared to
say C++ / BASIC etc.), even if the maths inside is simple.  Is there
some fundamental reason for this?
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Andrew <ast### [at] hotmail com> wrote:
: How come POV is so slow when it comes to parsing a complex script?
: Specifically, loops seem to take a very long time to run (compared to
: say C++ / BASIC etc.), even if the maths inside is simple.  Is there
: some fundamental reason for this?
  There are many reasons for this.
  Firstly, the main reason is that the POV-Ray scripting language is an
_interpreted_ language. This makes comparison with _compiled_ languages
(eg. C++) very unfair.
  A compiled language is parsed and the compiled to the native machine code
of the system. The result of this is a stand-alone program which doesn't need
the compiler anymore in order to make it run. The program is, in practice,
converted to machine code which the CPU understands by itself. This, of course,
makes it very fast. The biggest disadvantage of this is that the code will
run on that processor alone. It will not run in any other processor (unless
the other processor can emulate it). (In practice these programs won't run
in another OS even if the computer is identical, but the reasons for this
are a bit different.)
  An interpreted language is "executed" while it is parsed. For this you
always need the interpreter in order to run the program. The interpreter
parses the program and executes the commands at the same time. It should be
needless to say that this is a LOT slower than having a pre-compiled machine
code program. The biggest advantage and the main reason why this is done at
all is that the code becomes system-independent (as long as there's an
interpreter for that system).
  There is also a third type of execution, which is a mix between the two:
The code is first parsed and compiled to a "machine code" or "bytecode"
(which isn't necessary any machine code known by any processor) and then
this "machine code" is interpreted by an interpreter.
  I know of two languages which use this approach: Perl and Java. (Even they
do it a bit differently: Perl compiles the program on the fly to memory and
then interpretes the compiled code, while Java compiles the program to a
separate file and then this file can be interpreted as if it was a compiled
program.)
  This approach takes advantage of the two types of execution: The program
is still system-independent and its execution is pretty fast. (It's not as
fast as native machine code, of course, but much faster than regular a
interpreted language.) It still has the disadvantage of needing the interpreter
in order to run the program.
  POV-Ray is an interpreter in the most basic definition. It just parses the
code and "executes" at the same time.
  A negative thing about this is that POV-Ray is an extremely slow interpreter.
Most interpreters are optimized to be fast (eg. most BASIC interpreters for
instance) even though they don't byte-compile the code but parse it directly.
Usually they load the program to memory and do all kinds of tricks to speed up
the execution.
  However, POV-Ray does it the really slow way. POV-Ray interpretes the source
by readind directly the source file. Even #while-loops are done this way: A
loop will simply seek the file to the proper location. File I/O, even when
cached by the OS, is much slower than reading the source directly from memory.
  With macros residing in other files then where they are called from, this
gets even worse: Each time the macro is called, the file is opened, seeked to
the right location, then the macro is read and interpreted from the file, and
then the file is closed. This is very slow!
  POV-Ray was never intended to be a scripting language interpreter. It
didn't even have loops before version 3! (Please correct me here if I remember
wrong.) The original purpose of the pov-file was to contain the object
descriptions; POV-Ray would just read the file once and create the objects
and that's it. It wasn't designed to be a scripting language containing
loops and macros.
  Now it is such a thing, however. Optimizing the parser for speed isn't,
however, a light task. There have been more important things to develop
instead of this "secondary" feature.
  But perhaps in the future... (That is, POV-Ray 4).
  The best way to go, in my opinion, is to do the same thing as Perl:
First parse the file and compile it to some type of bytecode to memory
(the format of the bytecode is completely free; it should be something that
best suits the needs of POV-Ray) and then this bytecode is interpreted with
a highly optimized interpreter (which isn't a hard thing to do, in fact).
  This has a double-effect, however: Sources with no loops and macros will
parse slower than before! (Although not too much slower, I would say.) It's
only if the source has long loops and lots of macro calls when it starts to
get fast.
  With a source which makes huge amounts of loops and macro calls (as isn't
very unusual nowadays), I would say that the parsing speedup would be in the
order of 10 to 100 times faster. Even if the parsing of "straightforward"
files (ie. no loops nor macros) gets a bit slower, I think it's a very
reasonable price to pay.
  Of course one could go further and allow POV-Ray to dump the byte-coded
source to a file. Afterwards one could render just this file. POV-Ray would
just read it into memory and start interpreting right away, without any need
for parsing. This would speed up *everything* a lot.
  This has a drawback, though: If POV-Ray does the byte-coding only internally,
it can trust that it's always syntactically correct. That is, it doesn't have
to make sanity checks to the byte-code, which allows making a faster
interpreter for it. If, however, a user-made bytecode is given to POV-Ray,
it would have to make a sanity-check to it when it interpretes it, which may
cause some overhead.
-- 
#macro N(D,I)#if(I<6)cylinder{M()#local D[I]=div(D[I],104);M().5,2pigment{
rgb M()}}N(D,(D[I]>99?I:I+1))#end#end#macro M()<mod(D[I],13)-6,mod(div(D[I
],13),8)-3,10>#end blob{N(array[6]{11117333955,
7382340,3358,3900569407,970,4254934330},0)}//                     - Warp - Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Thanks for that very educational read, Warp. :)
Hey, we should start making a short list of things that simply must be in
4.0, like this improved parser you're talking about. That way the Team won't
forget about these things when making 4.0.
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Wow!  That was more of an answer than I was hoping for!  I had feared
that POV's scripting had simply never been designed for speed above all
other considerations.  I guess it's time I learn C++ properly and
generate some of my scenes that way...
I think I would have to agree - compiling to memory and then
interpreting the code would be a *very* welcome addition for v4.  Any
time lost in interpreting "normal" scenes would probably be negligible
on the vast majority of computers running POV by that time (I'm guessing
POV 4 is at least 2 years off?), and the benefits in terms of scripting
capability would be enourmous.
As regards sanity-checking of a compiled POV file, a flag at the start
of the file identifying it as a user-generated file could be used to
perform sanity-checking only when necessary.  One would then have to
trust people to use this flag properly, though...
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Theoretically then, a performance increase could be gained by unrolling
loops of fixed length?  For example, if this piece of code...
#declare loop=0;
#while (loop<10)
    #declare my_array [loop] = loop;
    #declare loop=loop+1;
    #end
...appeared in the middle of a macro called hundreds and hundreds of
times, would the scene parse faster if the following were substituted?
#declare my_array [0] = 0;
#declare my_array [1] = 1;
#declare my_array [2] = 2;
#declare my_array [3] = 3;
#declare my_array [4] = 4;
#declare my_array [5] = 5;
#declare my_array [6] = 6;
#declare my_array [7] = 7;
#declare my_array [8] = 8;
#declare my_array [9] = 9;
Presumably it would as there would be fewer I/O calls to seek back to
the beginning of the loop each time.  Or am I simply wrong :-) ?
Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | In article <3be69fc4@news.povray.org> , Warp <war### [at] tag povray  org>  wrote:
>   A negative thing about this is that POV-Ray is an extremely slow
> interpreter.
Actually, there is nothing fundamentally slow about the interpreter itself,
just the design depends on disk reading.  In fact the parser is spending
most of the time doing some kind of disk reading, and the speed increases
significantly if disk reading is faster.  If you copy your files to a RAM
disk* you will see a difference!
    Thorsten
* One that actually fits into the RAM, not one in virtual memory! Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | in news:3be6b7fd$1@news.povray.org Andrew wrote:
> would the scene parse faster if the following were substituted?
> 
> #declare my_array [0] = 0;
Had to try
100000 x #declare I=I+1; in a script versus a loop:
       kernel user   total
loop  : 2.98  10.20  13.19
script: 0.90   4.19   5.09 sec.
Ingo
-- 
Photography: http://members.home.nl/ingoogni/
Pov-Ray    : http://members.home.nl/seed7/
Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | How can I make a RAM disk to try that out? :)
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | So this isn't the same as letting Windows or whatever cache the file and
serve it from memory anyway?
> Actually, there is nothing fundamentally slow about the interpreter
itself,
> just the design depends on disk reading.  In fact the parser is
spending
> most of the time doing some kind of disk reading, and the speed
increases
> significantly if disk reading is faster.  If you copy your files to a
RAM
> disk* you will see a difference!
>
>     Thorsten
>
>
> * One that actually fits into the RAM, not one in virtual memory!
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | ingo <ing### [at] home nl> wrote:
: 100000 x #declare I=I+1; in a script versus a loop:
:        kernel user   total
: loop  : 2.98  10.20  13.19
: script: 0.90   4.19   5.09 sec.
  Loop-unrolling is indeed one of the oldest optimization tricks. The basic
idea is that if you unroll the loop, the interpreter doesn't have to parse
the #while and #end in each loop, but it just has to interpret the commands
inside the loop n times.
  The speedup seen above is most probably caused by the fact that POV-Ray
doesn't need to read and interpret the "#while" and "end" strings anymore to
do the same thing. Also not having to seek the file might speed up a bit, but
I don't think that's the main bottleneck here.
  (Note that when we are optimizing assembly code for a current processor,
loop-unrolling may not be a good optimization anymore; it may even be that
it slows down the program. Nowadays processors have extremely advanced
loop optimizations in themselves and usually we don't need to "guide" the
processor in doing it. On the other hand, loop unrolling means more code,
which means that the code cache will fill up faster, which means slower code.
Of course this is not the case in POV-Ray, which is, may I say, an extremely
primitive interpreter.)
-- 
#macro N(D,I)#if(I<6)cylinder{M()#local D[I]=div(D[I],104);M().5,2pigment{
rgb M()}}N(D,(D[I]>99?I:I+1))#end#end#macro M()<mod(D[I],13)-6,mod(div(D[I
],13),8)-3,10>#end blob{N(array[6]{11117333955,
7382340,3358,3900569407,970,4254934330},0)}//                     - Warp - Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |