|
|
|
|
|
|
| |
| |
|
|
From: Chris Cason
Subject: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 16 Aug 2004 14:56:06
Message: <41210346@news.povray.org>
|
|
|
| |
| |
|
|
An Open Letter to Joel Hruska from the developers of POV-Ray
------------------------------------------------------------
Joel,
Our attention has been drawn to a review in which you use POV-Ray to compare
the performance of several processors, and in the process discover what you
claim is a "significant problem with POV-Ray as a benchmark" and then further
float the possibility that we - the makers of POV-Ray - may be so unprofessional
and dishonest as to deliberately tweak our source code to suit Intel Corporation.
The portion of the report we are referring to is here:
http://www.sudhian.com/showdocs.cfm?aid=556&pid=2095
After reading this we were very concerned as the conclusions you draw, and
feel it is necessary to publicly correct these, particularly in the light of
your astonishing suggestion that the benchmark may be rigged. We will also
show that our benchmark results line up with both manufacturer's own published
performance specs.
We will address a number of statements that you make which are clearly wrong:
o your statement that "at worst, this is an example of a program being
hand-tweaked to favor one CPU over another, at best (and the explanation
we favor) it's a benchmark scene that fails to demonstrate real world
performance."
o your claim that POV-Ray isn't strongly affected by CPU cache or memory
bandwidth.
o the conclusions caused by your confusion of the terms 'codebase' and
'database'.
o your assumption that the demo files provided with our software are
closer to 'real world' scenes than our benchmark scene.
o your assumption that a wide variation in CPU performance indicates a
problem with the benchmark rather than with the CPU.
The 'tweaked source code' issue
-------------------------------
Our source code is openly available. In fact if you had cared to you could
have downloaded both the v3.5 and v3.6 source code from our FTP site and
compared them for any such tweaks - something that you did not, it appears, do
(hey, why let facts get in the way of a good conspiracy theory?). For
reference, the files in question are here:
ftp://ftp.povray.org/pub/povray/Old-Versions/Official-3.5/Linux/
ftp://ftp.povray.org/pub/povray/Official/Unix/
Of course this would have taken some time to do, but surely if you are going
to hint in public that we may have deliberately biased our software to suit
Intel, you should actually check the basis of this suggestion first ?
Does anyone honestly believe we could get away with such a tweak in the face of
the public availability of our source code, especially given that on the OS of
your choice - Linux - most distributions or testers build their own binaries ?
The benchmark file
------------------
Our benchmark file makes heavy use of 3d noise. 3d noise functions seem to
have a habit of bringing out issues with certain CPU's in some circumstances,
and despite your assertion that this isn't "real world", to the contrary, 3d
noise is extensively used in real scenes because it's one of the basic tools
to make realistic procedural textures. Few of the example scenes which come
with POV-Ray use 3d noise extensively because, largely, they are demonstrating
- in as simple a manner as possible - individual features of the program. This
is why we strongly suggest that people use benchmark.pov for benchmarking.
We'd like to be able to go into this in more depth but unfortunately as you
have chosen not to tell anyone what scenes you substituted for benchmark.pov -
or even what rendering parameters you used - it is impossible to replicate your
results in order to analyse them in any depth.
The CPU cache/memory bandwidth issue
------------------------------------
You claim that "POV-RAY tends to be almost-entirely CPU-dependent. Neither
cache nor memory bandwidth has much effect on the program".
While we cannot determine exactly how you come to this conclusion we must
assume that it is yet another example of the fact that you have been using the
wrong scenes to test with. "Real world" scenes, such as those created by our
artists, typically occupy much more memory than the very small and limited
test scenes that you have by your own admission been using, and thus require
access outside of the L1/L2 cache much more often. Unfortunately we cannot
critique your actual choice of scenes since, as mentioned above, you have
chosen not to tell anyone which ones you used.
The confusion between 'codebase' and 'database'
-----------------------------------------------
In your article you quote the statement from our website that benchmark.pov
uses many of POV-Ray's internal features and that using something else may rely
too heavily on one or another specific portion of our codebase. For the
reference of our readers, that quote is taken from here:
http://www.povray.org/download/benchmark.php
You then later state, and we quote, "If, after fifty renders, our tests are
still 'relying too heavily on one portion of the POV-RAY database', than we
strongly suggest POV-RAY update its database so as not to load so many similar
render scenes".
There is no such thing as a 'POV-Ray database' (at least, not one maintained
by us). Taking a guess based on the context of your suggestion, we are
assuming that you do not understand the difference between the terms
'codebase' and 'database' and that further you are somehow assuming that the
demo scenes provided with POV-Ray are some sort of 'database'.
'codebase' means exactly what it says - our 'code' base. The stuff we run
through the compiler to produce executable files. One of the purposes of
benchmark.pov is to exercise fairly wide coverage of our codebase, exactly as
we state. A 'database' is, well, a database. For an example, see
http://www.mysql.org/.
Additionally, going on the above assumption that you are referring to our demo
scenes, we will point out that we are not the ones who 'load so many similar
render scenes' as you are the person choosing the scenes to render (or 'load'),
not us.
The incorrect use of demo scenes
--------------------------------
You also say, and we quote, "We could find no other scene included with the
program that demonstrated performance levels similar to this one, and after
fifty renders, we should have."
We must ask why you 'should have' ? How many of the other scenes of those 50
were specifically designed to stress the renderer ? 50? 30? 10? 1?
Let me tell you. None. Zilch. Not one. Why "should you" have found one like
that then ? With the exception of benchmark.pov, the demo scenes provided with
POV-Ray are probably the worst choice for 'real world' tests, as they are (for
the most part) just that - demos. Demos of one feature or another meant for
the education of new users, not benchmarking. And of those scenes that are not
demos per se, very few were created in recent times, and thus tend not to use
many of the newer features of the program. It's this very problem - causing
many people to ask us for a formal benchmark scene - that initiated the
creation of benchmark.pov in the first place, and the reason that we strongly
recommend that folks use benchmark.pov for benchmarking.
Intel and AMD's own results
---------------------------
Further, note that it is expected that an AMD Athlon FX-53 is slightly
outperformed by an Intel Pentium 4 "E" processor when it comes to floating-
point performance, which is also what counts most for the ray-tracing
algorithm. This is supported by official benchmark results published by AMD
and Intel respectively, in particular results submitted to SPEC by AMD and
Intel:
Pentium 4:
http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040621-03126.html
Athlon:
http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040628-03168.html
the same test also verifies that integer Performance of the Athlon FX-53 is
slightly better than that of the same Pentium 4:
Pentium 4:
http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040621-03127.html
Athlon:
http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040628-03181.html
Also note that both companies, unlike your benchmarks, provide all data
necessary to reproduce their measurements easily. We are certain AMD and
Intel are competent in getting maximum performance our of their own processors
and do not tweak any of the benchmarks they run to their competitors
disadvantage. Yet, their results line up rather well with what "benchmark.pov"
shows.
Your conclusion as to the cause of the variation
------------------------------------------------
The fact is that numbers are numbers and if a CPU is suffering in a particular
test the proper thing to do - where possible - is find out WHY, not to discard
the test that causes it and say that you will explicitly avoid using it as a
performance metric. By doing so you have deliberately chosen to exclude from
your reports a clear indication that our benchmark is hitting on something
that is significantly different between the processors under test - an
indication that you, it seems, do not like to see or have in your own reports,
despite the fact that this is one of the very things that reviews are supposed
to find.
Given that the source to POV is available it certainly would have been
possible for someone such as yourself to perform more in-depth analysis of
this issue, had you chosen to do so.
Your methodology
----------------
We also take issue with the fact that you choose to not disclose either the
files you use as input or the settings you use to run our software. It is
generally considered a basic tenet of benchmarking that such tests are
reproducible; if you choose not to provide the means for anyone to reproduce
your tests, how can you expect anyone to trust your results ? It has certainly
made it a lot harder for us to comprehensively respond to your article.
Summary
-------
It's clear to us that benchmark.pov didn't give you the results you liked.
However we suggest that instead of hinting at unsubstantiated conspiracy theories
based on clear misunderstandings and poorly-documented tests you just get over
it and either accept the numbers or investigate the cause of them. If you
don't want to use our benchmark, fine, but how dare you suggest that we are
somehow dishonest enough to tweak our code to suit one vendor over another
when running the benchmark file?
------------------------------------------------------------------------------
Posted by Chris Cason on behalf of the POV-Team
------------------------------------------------------------------------------
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 16 Aug 2004 15:00:29
Message: <4121044d@news.povray.org>
|
|
|
| |
| |
|
|
And to add to this...
Recently your website used POV-Ray 3.6 for comparison of recent AMD and
Intel processors. You end you discussion of POV-Ray as a benchmark by
implying that our code has been "hand-tweaked to favor one CPU". The source
code of POV-Ray 3.6 is available on our website for download. As such, it
would have been easy to determine that such claim is simply false. Neither
the Windows nor the Linux version of POV-Ray contain any code specific to
any CPU at all.
You further imply that the benchmark scene would be unsuitable as it "fails
to demonstrate real world performance". First of all, note that the
benchmark scene did not change from POV-Ray version 3.5 released over two
years ago to version 3.6 released a few weeks ago.
You also assume that demo scenes included with POV-Ray would demonstrate
such performance. As stated in the user manual, the demo scenes included
with POV-Ray are there for educational purposes only. As such, they
demonstrate only one or so very specific features of POV-Ray, consequently
using only a tiny portion of the POV-Ray codebase.
Unfortunately you decided to using custom and undisclosed setting for
rendering the benchmark scene as well as the other test scenes. You also
decided to not specify with eight scenes of which fifty scenes you selected.
As such, nobody can easily reproduce your results. However, we did render
the scenes from the "advanced" directory as they are those taking longest to
render.
The scene from the "advanced" directory we found taking longest to
render was "mediasky.pov". This scene, as its name implies also shows only
one single feature, media. As such, it heavily depends on the performance
of the noise function code, which is very small and tends to favor systems
with a fast L1 cache because it is very tiny. The noise function is
certainly important to create a realistic scene, but it is never found
exclusively in any "real world" scene.
The scene taking second-longest to render is stackerday.pov and its "night"
version stackernight.pov takes about half the time to render. There are
some other "stacker*.pov" variations all sharing the same characteristics
and varying only in the coloring. All these scenes contain may boxes and
text, which are only two of the 24 geometric primitives supported by
POV-Ray. The algorithm to intersect a ray with a box requires many
unpredictable comparison operation and branches, and as such processors with
a very long pipeline will perform poorly when running this algorithm. As
such, the processor will be waiting to find out what to do next most of the
time while the pipeline stalls. Even worse, they make heavy use of our
experimental radiosity feature. As such, the stacker-scenes are very poor
choice to use as benchmark.
We determined "balcony.pov" to take fourth-longest to render. This scene is
much closer to a suitable benchmark scene than all other scenes discussed
until now. It has a fair selection of geometric primitives and uses several
other features. As such, it does exercise many different parts of the
POV-Ray codebase. Yet, its major drawback is that it uses "radiosity", a
feature we explicitly marked as experimental (a warning is shown
automatically if radiosity is used in a scene).
Rounding off the top-five is "optics.pov". This scene demonstrates a single
POV-Ray feature, photon mapping. The photon mapping algorithm requires a
lot of searching and sorting and in general tends to be optimized well by
current compilers and raw clock speed matters the most for this scene. Most
of this simple scene should fit into the L2 cache because it only uses a
small amount of photons sufficient for such a specific demo. Obviously this
is not a suitable benchmark scene.
Sixth longest to render was "landscape.pov". It demonstrates a single
feature, the height-field object. It does not even contain advanced
lighting or anything else fancy. Only a tiny portion of the POV-Ray
codebase will be needed to render this scene.
Seventh longest took "glasschess.pov". This scene uses ten different
geometric primitives, but it does not use noteworthy texturing. Its goal is
to demonstrate refraction and reflections. It also tends to fit entirely
into the L2 cache. As such, it is not suitable to benchmark a system as a
whole, and its compactness certainly is not representative of the average
POV-Ray scene today.
Eighth was "stackertransp.pov", already discussed as part of the other
"stacker*.pov" scenes.
Ninth was "isocacti.pov", which demonstrates the isourface geometric
primitive together with a huge amount of cones. Yet, the cones hardly
contribute anything the the overall computation, but they do push the scene
size well beyond the L2 cache size. Nevertheless, isosurfaces use a virtual
machine to evaluate user-supplied functions and as such this has little to
do with ray-tracing. As such, the scene can say almost nothing about other
POV-Ray ray-tracing performance and consequently is an unsuitable benchmark.
The tenth longest-tracing scene was "newdiffract.pov". This scene again
uses only almost only photons. However, unlike "optics.pov", it uses many
more photons. Thus, random memory access dominates in this scene when
searching for photons. Still, the photons are only relevant in small
portions of the scene, and as such while there are many more photons, they
contribute much less to the scene. Thus, apart from a bit of photon use,
this scene only contains few primitives and renders very quickly because it
is fairly simple. It does certainly not represent any general scene, but is
only a demo to show diffraction. As such, it exercises only a small part of
the POV-Ray codebase and is unsuitable as a benchmark scene.
Now, it should be noted that the first four longest-rendering scenes from
the "advanced" directory take already about as long as your eight scenes
relative to "benchmark.pov" when using the same settings. Thus, unless you
specify the scenes you used to generate your results, they lack any
credibility.
Of course, there are many more demo scenes included with POV-Ray, but
without exception, all other demo scenes show only exactly a single POV-Ray
feature to help learning to use POV-Ray. Thus, if you used any of those
scene for your benchmarking as part of the eight or fifty scenes you
mention, you can be certain you did not generate representative numbers at
all as you at most benchmarked performance of eight - which equals just one
third - different geometric primitives supported by POV-Ray.
And even if you ran eight of the top ten longest-rendering "advanced"
scenes, you encountered all the problems outlined above. Consequently, your
benchmarking was simply flawed.
Of course, we do not claim "benchmark.pov" is the perfect benchmark, but it
does provide a balance between the many features used to create a great real
world POV-Ray scene. Any claim we created a benchmark scene that is
intentionally or due to incompetence biased towards Intel processors has no
merit. To the contrary, even two weeks prior to your article we had already
made available a beta version of POV-Ray 3.6 for Windows XP 64-bit Edition,
even adding support AMD 64-bit processor extensions! Nevertheless, with the
32-bit version of POV-Ray runs as good as possible on AMD processors.
Consequently, we would like to suggest you refrain from making baseless
accusations that the POV-Team favors Intel processors or is incompetent at
creating a suitable benchmark. Instead, we challenge you to provide the
actual scenes and settings used to generate the "benchmark" results you
claim more accurately reflect the performance apparently not even AMD and
Intel expect.
Sincerely,
Thorsten Froehlich, POV-Team
PS: The term "codebase" refers to the compiled program code of POV-Ray and
has absolutely nothing to do with a "database" for which you confused it in
your article. We are currently working to reworded out website to no long
use the technical term "codebase" as it does seem to confuse non-computer
scientists.
Post a reply to this message
|
|
| |
| |
|
|
From: Slime
Subject: Re: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 16 Aug 2004 16:30:38
Message: <4121196e@news.povray.org>
|
|
|
| |
| |
|
|
> The scene from the "advanced" directory we found taking longest to
> render was "mediasky.pov". This scene, as its name implies also shows
only
> one single feature, media. As such, it heavily depends on the performance
> of the noise function code, which is very small and tends to favor systems
> with a fast L1 cache because it is very tiny. The noise function is
> certainly important to create a realistic scene, but it is never found
> exclusively in any "real world" scene.
This seems contrary to what was in Chris Cason's message:
> Our benchmark file makes heavy use of 3d noise. 3d noise functions seem to
> have a habit of bringing out issues with certain CPU's in some
circumstances,
> and despite your assertion that this isn't "real world", to the contrary,
3d
> noise is extensively used in real scenes because it's one of the basic
tools
> to make realistic procedural textures.
I'm not sure if "the noise function" is referring to the same thing as "3d
noise", but it's likely that the guy who wrote the benchmark results page
will think so =)
- Slime
[ http://www.slimeland.com/ ]
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 16 Aug 2004 16:44:24
Message: <41211ca8@news.povray.org>
|
|
|
| |
| |
|
|
In article <4121196e@news.povray.org> , "Slime" <fak### [at] emailaddress> wrote:
> I'm not sure if "the noise function" is referring to the same thing as "3d
> noise", but it's likely that the guy who wrote the benchmark results page
> will think so =)
You apparently misunderstood what you read. We are both talking about the
same thing and there is no contradiction at all. I suggest you read again
what you quoted more carefully.
Thorsten
____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde
Visit POV-Ray on the web: http://mac.povray.org
Post a reply to this message
|
|
| |
| |
|
|
From: Slime
Subject: Re: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 17 Aug 2004 01:51:35
Message: <41219ce7@news.povray.org>
|
|
|
| |
| |
|
|
> You apparently misunderstood what you read. We are both talking about the
> same thing and there is no contradiction at all. I suggest you read again
> what you quoted more carefully.
Hmm. It seems to me that the letter (Chris' posting) was saying "our
benchmark file makes heavy use of noise because it's a good measurement of
CPU performance," whereas you were saying "mediasky.pov makes heavy use of
noise and is therefore a bad measurement of CPU performance."
I guess the points were that mediasky.pov uses noise too much whereas
benchmark.pov uses it just the right amount, but the way it was said makes
it seem like noise is simultaneously a good and a bad thing to use in a
benchmark file.
- Slime
[ http://www.slimeland.com/ ]
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 17 Aug 2004 02:24:41
Message: <4121a4a9@news.povray.org>
|
|
|
| |
| |
|
|
In article <41219ce7@news.povray.org> , "Slime" <fak### [at] emailaddress> wrote:
> Hmm. It seems to me that the letter (Chris' posting) was saying "our
> benchmark file makes heavy use of noise because it's a good measurement of
> CPU performance," whereas you were saying "mediasky.pov makes heavy use of
> noise and is therefore a bad measurement of CPU performance."
But that is not what I said. I said that "mediasky.pov" uses it
*exclusively*, which says exactly nothing about the use of noise in the
benchmark file. Everything else is something you are reading into what we
say but that we do not say at all.
> I guess the points were that mediasky.pov uses noise too much whereas
No, I am not saying "too much", I am saying *exclusively*. "Too much" could
be 80%, 90% or 95% or whatever one thinks, the word exclusively means 100%
and it does not have anything in common with the term "too much"!
Either way, we wrote this as an open letter and we posted that letter here
for reading and discussion of what is said about the article we are
responding to. We are not asking anybody to review our open letter. We
reviewed it in the team and the version we posted is the final version we
*already* used.
Thorsten
____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde
Visit POV-Ray on the web: http://mac.povray.org
Post a reply to this message
|
|
| |
| |
|
|
From: Ross
Subject: Re: An Open Letter to Joel Hruska from the developers of POV-Ray
Date: 17 Aug 2004 10:30:19
Message: <4122167b$1@news.povray.org>
|
|
|
| |
| |
|
|
"Thorsten Froehlich" <tho### [at] trfde> wrote in message
news:4121a4a9@news.povray.org...
> In article <41219ce7@news.povray.org> , "Slime" <fak### [at] emailaddress>
wrote:
>
> > Hmm. It seems to me that the letter (Chris' posting) was saying "our
> > benchmark file makes heavy use of noise because it's a good measurement
of
> > CPU performance," whereas you were saying "mediasky.pov makes heavy use
of
> > noise and is therefore a bad measurement of CPU performance."
>
> But that is not what I said. I said that "mediasky.pov" uses it
> *exclusively*, which says exactly nothing about the use of noise in the
> benchmark file. Everything else is something you are reading into what we
> say but that we do not say at all.
>
Reading into it is exactly what the intended audience will do. In my
opinion, you are presenting too much information at one time to that
audience. If you don't like how Slime interpreted it, imagine how less
technically astute users would interpret it.
-r
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Thorsten Froehlich" <tho### [at] trfde> wrote:
>
> ... other "stacker*.pov" variations ...
> All these scenes contain may boxes and text, which are only two of the
> 24 geometric primitives supported by POV-Ray. The algorithm to intersect
> a ray with a box requires many unpredictable comparison operation and
> branches, and as such processors with a very long pipeline will perform
> poorly when running this algorithm. As such, the processor will be
> waiting to find out what to do next most of the time while the pipeline
> stalls. Even worse, they make heavy use of our experimental radiosity
> feature. As such, the stacker-scenes are very poor
> choice to use as benchmark.
>
Whoo-hoo!
I've entered the history of programming!
Greg M. Johnson
co-author, stacker*.pov files
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|