|
|
On 11 Apr 1998 12:18:55 GMT, m### [at] mcom (Fredrik) wrote:
>There is an PPro version, but where ?
>If anyone could put it up on their page/ftp account, and tell us, I
>and probably some others would be happier.
>Thanx in advance.
>
>/Fredrik
>
>--PII-300,64MB
I was watching this thread unsure if there was another P(OV)Pro
version, from the one I have. Now, that it seems to be sure, he meant
a POV version for the Pentium Pro optimized, as another, P2 user,
would not know about it, independantly of the question, if there ever
is out such a version, I post now here the readme file that goes with
the POVPro I have:
-------------------
POV-Ray(tm) 3.02.proton (POVPro)
Win32 Console (x86) Unofficial Optimized Version of the
Persistence of Vision(tm) Ray Tracer
1. Introduction.
2. Speed improvements: POVPro as a performance tool.
3. Additional command line switches.
4. Histograms accurate to 1 CPU cycle.
5. Win32 goodies.
6. Floating point precision and image accuracy.
6.1. Rounding mode.
6.2. Reciprocals.
6.3. Compensation.
6.4. Result.
7. Conclusion.
1. Introduction
---------------------------------------------------------------
POVPro is a Win32 console port of generic UNIX version of POV-Ray --
it's not a "stripped" Windows version.
The name POVPro is given to this version to somehow distinguish it
from other existing versions, like POVWin. It has nothing to do with
"professional" or "Pentium Pro". It simply illustrates the fact that
it
is made with Intel Reference C Compiler a.k.a Proton. Yet another
Intel's tool used is VTune 2.1 profiler.
POVPro requires Windows NT or Windows 95 -- it will *not* run on
Windows 3.x plus Win32s since it's a console application. POVPro does
*not* require Pentium Pro or even Pentium to run, it will run on
80486, but Pentium is strongly recommended. Additionally, you will be
unable to generate histograms on anything less than Pentium.
The following is a brief list of improvements to the official
version
of POV-Ray 3.02 ( see other sections for details ):
- 5% to 50% faster renderings,
- ability to examine internal representation of scenes,
- histograms accurate to 1 CPU cycle ( plus controllable accuracy ),
- long names support and other Win32 goodies.
POVPro is provided as a performance tool for experienced POV-Ray
users so that they can do their work faster. It's "examining internal
representation" feature may also be used by those willing to make
their
own enhancements to POV-Ray. But in any case you must obtain the full
package elsewhere -- please see the file POVWHERE.GET for more info.
POVPro ( enhanced version of POV-Ray 3.02 ) was made by me, Vadim
Sytnikov, and is distributed under the POV-Ray 3 general license --
please see the file POVLEGAL.DOC for details.
Copyright (C) 1996-1997 POV-Team(tm).
POVPro is (C) 1997 Vadim V. Sytnikov.
2. Speed Improvements: POVPro As A Performance Tool
---------------------------------------------------------------
I started to work on POVPro in order to incorporate some animation
enhancements not found in POV-Ray. Ray-traced animation is extremely
time-consuming, so I:
- used the best optimizing compiler ( Intel's Proton ),
- used its most-aggressive optimization modes,
- re-wrote several functions to incorporate general-purpose
optimizations ( like the use of reciprocals -- see sect. 6 ),
- reorganized source code go get the most of the compiler.
The results was so cheerful that I decided to make POVPro freely
available. Here is a brief illustration: after I ported generic UNIX
version to Win32, I started to profile it with VTune, make
enhancements, profile again, etc., with the following results for
*arbitrarily* chosen scene ( povray 3.02.wat.cwa is MS-DOS version
made
with Watcom C plus CauseWay DOS Extender ):
- povray 3.02.wat.cwa ................................ 53 secs
- povpro [ just ported ] ............................. 42 secs
- povpro [ + experimental optimization options ] ..... 41 secs
- povpro [ + rounding control set to "truncate" ] .... 39 secs
- povpro [ + my source code modifications ] .......... 36 secs
- povpro [ + profile-guided optimization ] ........... 35 secs
As you can see, net result for this scene is 53 / 35 = 51.4% speed
increase. Though the results are not always that good, you *will* see
the difference.
In general, speed increase is about 10-20%. For example,
POV3DEMO\RADIOS\RAD2.POV (radiosity) rendering yields the following
results:
- povray 3.02.wat.cwa ............ 2215 secs
- povpro [ fully optimized ] ..... 1764 secs ==> 25.5% faster.
I tested hundreds of POV-Ray's sample scenes, with and without
antialiasing, with and without the use of bounding slabs and vista
buffer; 17 animations; several radiosity renderings. There was the
*ONLY* slower rendering, POVSCN\LEVEL3\SNACK.POV:
- povray 3.02.wat.cwa [ antialiasing OFF ] . 70 secs
- povpro [ antialiasing OFF ]............... 58 secs ==> 20% faster
- povray 3.02.wat.cwa [ antialiasing ON ] .. 262 secs
- povpro [ antialiasing ON ] ............... 276 secs ==> 5% slower
POVPro worked faster, as always :-), but with antialiasing on, it made
considerably more passes this time. See section 6 for "how and why".
3. Additional Command Line Switches
---------------------------------------------------------------
There are 3 new options; use:
- +H8 to view extra help page,
- +T to dump internal representation of objects to
<scene_name>.lst,
- +Ynn to set precision timer accuracy, 0 (most) to 32 (least).
The latter two have INI-file equivalents, Tree_Info={bool} and
Timer_Accuracy={num}, respectively. Also see section 4.
The +T option is primarily of interest for those of you tinkering
with POV-Ray source code. It might help you to learn a lot of
interesting things about POV-Ray, e.g. that internally:
- there is no such thing as "cylinder"; cylinders are represented as
cones with equal radii,
- there is no such thing as "difference"; differences are
intersections with inverted members ( all but the very first ),
- etc. etc.
You will also see internal objects' flags, bounding slabs hierarchy,
objects' pigments, etc. etc. Note, however, that object's texture is
extremely complex thing; so I decided not to include in the dump its
fields other than the pigment ( which is quite a complex structure
itself, BTW ); just to maintain better signal / noise ratio :-). See
SCENE.LST inside XSAMPLES.ZIP.
4. Histograms Accurate To 1 CPU Cycle
---------------------------------------------------------------
In short: I used Pentium's RDTSC instruction to implement precision
timer used while generating histograms. RDTSC reads internal 64-bit
counter which maintains the number of CPU clocks passed since you
switched your computer on. Save its accuracy, this method has yet
another great feature: it takes virtually no time ( no overhead ).
The +Ynn option controls precision timer's accuracy, which is set to
1 << nn; default is 6 == accurate to 64 CPU cycles. More exactly, nn
controls not accuracy but rounding of each measurement ( sample ).
Results are ... remarkable. On histograms generated with POVPro, you
can even see the areas where ray tracer spent extra time doing tests
before rejecting an object (!) -- just compare HIST_DOS.TGA ( made
with
povray.wat.cwa ), HIST_Y12.TGA ( made with povpro +Y12 ), and HIST_Y4
( povpro +Y4 ) found inside XSAMPLES.ZIP.
5. Win32 Goodies
---------------------------------------------------------------
Long names support, command line up to 1024 characters long
( artificial restriction ), etc.
Additionally, I replaced 2 annoying speaker beeps at the end of
frame
rendering with Windows "ICONASTERISK" sound ( in Windows, it is
usually
played when message box with exclamation mark appears ). Enjoy your
soundcard. Or turn it off. It was not that easy with PC speaker.
6. Floating Point Precision And Image Accuracy
---------------------------------------------------------------
If you'll render a scene with povray.wat, then with povpro, and then
compare resulting .tga images with, say, FC/B, chances are that the
images will differ. Human eye is unable to catch the difference, but
byte-by-byte comparison shows it. This section explains why it's so,
and why you can safely ignore it.
6.1. Rounding Mode
In section 2, I wrote that rendering time was reduced from 41 to 39
secs ( that is, by 5.1% ) by using "setting rounding control to
truncate" option. This was tricky, but acceptable.
The scene I used for profiling with VTune had a lot of complex
textures, so the profiler showed me quite a sharp peak at DNoise() --
about 34% of the whole rendering time. This was interesting, since
the function doesn't look too complicated, and, at first glance, the
number of its calls did not correspond to 34% of time. But closer
examination revealed the truth: great deal of implicit double ==> int
conversions.
Such a conversion usually results in a function call. Intel's Proton
does better work by inlining the corresponding code, but the problem
remains -- this code switches CPU rounding mode. This hits Pentiums
very hard ( especially P6 ). The thing is that CPU's default fp
rounding mode is "round to nearest", but C language requires floating
point values to be truncated while converting to integers. Proton
provides an option which sets rounding mode to "truncate" for entire
program, thus avoiding those time-consuming switches. I've used it.
6.2. Reciprocals
Whenever Proton sees statements like
a /= d; b /= d; c =/ d;
it replaces them with
t = 1.0 / d; a *= t; b *= t; c *= t;
which is roughly 3 times faster ( compared to FDIV, all other
instructions execute virtually for free; more than 30 other
instructions could have been executed instead of one FDIV ). Compiler
recognizes such a pattern when statements are close to each other. I
pushed this a bit farther by making modifications to VECTOR.H and many
C modules. Needless to say, that results of "a /= d" and "t = 1.0 / d;
a *= t" will sometime differ. BTW, this was my only source-level
modification that affects results. Others mainly have to do with
"for()" ==> "do while()" replacements ( when human reader, but not
compiler, can see that the loop does execute at least once ), etc.
6.3. Compensation
Good compensation for all the above mentioned *potential* loss of
precision is Proton's power. I mean so called register allocation.
In a few words: Proton is able to keep in fp registers considerably
more intermediate results than any other compiler ( I've tested
Symantec C 7.21, Watcom C 10.5, and Microsoft VC 5.00 ). What happens
when compiler stores 80-bit fp register into 64-bit temporary variable
and later re-loads it ? 64 - 52 = 12 bits of mantissa are gone. And
Proton often succeeds in preserving them, while other compilers fail.
Do this compensate for the above mentioned loss ? God knows.
6.4. Result
All C compilers I'm aware of have "better IEEE 754 compliance" or
similar option which allows to *reduce* fp precision to maintain
better
compatibility. When this option is On, compilers ( among other things
)
force store-load sequence for each "double" intermediate result, thus
insuring 64-bit precision across each statement.
What does this mean ? That, without the above mentioned option, fp
results are "too precise". And even if we do use some tricks that
lower
precision, it is not clear whether it is below or still above that
required by IEEE 754 standard for 64 bits, even for not-so-optimizing
compilers. I think it's still above. You decide.
7. Conclusion
---------------------------------------------------------------
First, I would like to sincerely thank all POV-Team members for
their
extremely good work. And I hope that POVPro will be at least a bit as
helpful to you as POVRay is helpful to me.
I made each and every effort to ensure POVPro works correctly. I
made
about a hundred of manual and 859 batch-mode tests to ensure there are
no flaws. The only two things I found so far are:
- INI file scanner did not strip trailing spaces of Library_Path
paths.
I've checked povray.wat and found that it processes such paths just
fine. I thought it was a bug (?) in generic UNIX sources, so I
added
a few lines of code. Fixed.
- Scene files with a great deal of "declare"s without leading # might
fail to compile. POVSCN\LEVEL3\CHESS.POV is my only example. Cure
is
to use hash marks, just as documentation requires :-). I regard
this
as syntax error in CHESS.POV. I cannot be responsible for fossils.
Not fixed -- and will never be.
The above two cases illustrate my position: if you'll happen to find
a bug, please drop me a line; I will, probably, try to fix it IFF:
- official version[s] of POVRay do not have it, and
- suspicious behavior is not by design.
I can be reached ( either directly or thru redirector ) at:
syt### [at] rucom
syt### [at] gamosru
syt### [at] gamosmsksu
vad### [at] sntmltzaporizhzheua
However, I assume no, absolutely no obligations. Whatever will or
will not happen as a result of using or misusing the POVPro, I'm not
responsible for that. The software is provided on an "as is" basis.
Happy tracing,
Vadim.
Post a reply to this message
|
|