POV-Ray: Newsgroups: povray.general: An Open Letter to Joel Hruska from the developers of POV-Ray: An Open Letter to Joel Hruska from the developers of POV-Ray

POV-Ray : Newsgroups : povray.general : An Open Letter to Joel Hruska from the developers of POV-Ray : An Open Letter to Joel Hruska from the developers of POV-Ray		Server Time 26 Apr 2024 02:04:42 EDT (-0400)
From: Chris Cason
Date: 16 Aug 2004 14:56:06
Message: <41210346@news.povray.org>
An Open Letter to Joel Hruska from the developers of POV-Ray
------------------------------------------------------------

Joel,

Our attention has been drawn to a review in which you use POV-Ray to compare
the performance of several processors, and in the process discover what you
claim is a "significant problem with POV-Ray as a benchmark" and then further
float the possibility that we - the makers of POV-Ray - may be so unprofessional
and dishonest as to deliberately tweak our source code to suit Intel Corporation.
The portion of the report we are referring to is here:

  http://www.sudhian.com/showdocs.cfm?aid=556&pid=2095

After reading this we were very concerned as the conclusions you draw, and
feel it is necessary to publicly correct these, particularly in the light of
your astonishing suggestion that the benchmark may be rigged. We will also
show that our benchmark results line up with both manufacturer's own published
performance specs.

We will address a number of statements that you make which are clearly wrong:

  o your statement that "at worst, this is an example of a program being
    hand-tweaked to favor one CPU over another, at best (and the explanation
    we favor) it's a benchmark scene that fails to demonstrate real world
    performance."

  o your claim that POV-Ray isn't strongly affected by CPU cache or memory
    bandwidth.

  o the conclusions caused by your confusion of the terms 'codebase' and
    'database'.

  o your assumption that the demo files provided with our software are
    closer to 'real world' scenes than our benchmark scene.

  o your assumption that a wide variation in CPU performance indicates a
    problem with the benchmark rather than with the CPU.

The 'tweaked source code' issue
-------------------------------

Our source code is openly available. In fact if you had cared to you could
have downloaded both the v3.5 and v3.6 source code from our FTP site and
compared them for any such tweaks - something that you did not, it appears, do
(hey, why let facts get in the way of a good conspiracy theory?). For
reference, the files in question are here:

  ftp://ftp.povray.org/pub/povray/Old-Versions/Official-3.5/Linux/
  ftp://ftp.povray.org/pub/povray/Official/Unix/

Of course this would have taken some time to do, but surely if you are going
to hint in public that we may have deliberately biased our software to suit
Intel, you should actually check the basis of this suggestion first ?

Does anyone honestly believe we could get away with such a tweak in the face of
the public availability of our source code, especially given that on the OS of
your choice - Linux - most distributions or testers build their own binaries ?

The benchmark file
------------------

Our benchmark file makes heavy use of 3d noise. 3d noise functions seem to
have a habit of bringing out issues with certain CPU's in some circumstances,
and despite your assertion that this isn't "real world", to the contrary, 3d
noise is extensively used in real scenes because it's one of the basic tools
to make realistic procedural textures. Few of the example scenes which come
with POV-Ray use 3d noise extensively because, largely, they are demonstrating
- in as simple a manner as possible - individual features of the program. This
is why we strongly suggest that people use benchmark.pov for benchmarking.

We'd like to be able to go into this in more depth but unfortunately as you
have chosen not to tell anyone what scenes you substituted for benchmark.pov -
or even what rendering parameters you used - it is impossible to replicate your
results in order to analyse them in any depth.

The CPU cache/memory bandwidth issue
------------------------------------

You claim that "POV-RAY tends to be almost-entirely CPU-dependent. Neither
cache nor memory bandwidth has much effect on the program".

While we cannot determine exactly how you come to this conclusion we must
assume that it is yet another example of the fact that you have been using the
wrong scenes to test with. "Real world" scenes, such as those created by our
artists, typically occupy much more memory than the very small and limited
test scenes that you have by your own admission been using, and thus require
access outside of the L1/L2 cache much more often. Unfortunately we cannot
critique your actual choice of scenes since, as mentioned above, you have
chosen not to tell anyone which ones you used.

The confusion between 'codebase' and 'database'
-----------------------------------------------

In your article you quote the statement from our website that benchmark.pov
uses many of POV-Ray's internal features and that using something else may rely
too heavily on one or another specific portion of our codebase. For the
reference of our readers, that quote is taken from here:

  http://www.povray.org/download/benchmark.php

You then later state, and we quote, "If, after fifty renders, our tests are
still 'relying too heavily on one portion of the POV-RAY database', than we
strongly suggest POV-RAY update its database so as not to load so many similar
render scenes".

There is no such thing as a 'POV-Ray database' (at least, not one maintained
by us). Taking a guess based on the context of your suggestion, we are
assuming that you do not understand the difference between the terms
'codebase' and 'database' and that further you are somehow assuming that the
demo scenes provided with POV-Ray are some sort of 'database'.

'codebase' means exactly what it says - our 'code' base. The stuff we run
through the compiler to produce executable files. One of the purposes of
benchmark.pov is to exercise fairly wide coverage of our codebase, exactly as
we state. A 'database' is, well, a database. For an example, see
http://www.mysql.org/.

Additionally, going on the above assumption that you are referring to our demo
scenes, we will point out that we are not the ones who 'load so many similar
render scenes' as you are the person choosing the scenes to render (or 'load'),
not us.

The incorrect use of demo scenes
--------------------------------

You also say, and we quote, "We could find no other scene included with the
program that demonstrated performance levels similar to this one, and after
fifty renders, we should have."

We must ask why you 'should have' ? How many of the other scenes of those 50
were specifically designed to stress the renderer ? 50? 30? 10? 1?

Let me tell you. None. Zilch. Not one. Why "should you" have found one like
that then ? With the exception of benchmark.pov, the demo scenes provided with
POV-Ray are probably the worst choice for 'real world' tests, as they are (for
the most part) just that - demos. Demos of one feature or another meant for
the education of new users, not benchmarking. And of those scenes that are not
demos per se, very few were created in recent times, and thus tend not to use
many of the newer features of the program. It's this very problem - causing
many people to ask us for a formal benchmark scene - that initiated the
creation of benchmark.pov in the first place, and the reason that we strongly
recommend that folks use benchmark.pov for benchmarking.

Intel and AMD's own results
---------------------------

Further, note that it is expected that an AMD Athlon FX-53 is slightly
outperformed by an Intel Pentium 4 "E" processor when it comes to floating-
point performance, which is also what counts most for the ray-tracing
algorithm.  This is supported by official benchmark results published by AMD
and Intel respectively, in particular results submitted to SPEC by AMD and
Intel:

Pentium 4:
  http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040621-03126.html
Athlon:
  http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040628-03168.html

the same test also verifies that integer Performance of the Athlon FX-53 is
slightly better than that of the same Pentium 4:

Pentium 4:
  http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040621-03127.html
Athlon:
  http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040628-03181.html

Also note that both companies, unlike your benchmarks, provide all data
necessary to reproduce their measurements easily.  We are certain AMD and
Intel are competent in getting maximum performance our of their own processors
and do not tweak any of the benchmarks they run to their competitors
disadvantage. Yet, their results line up rather well with what "benchmark.pov"
shows.

Your conclusion as to the cause of the variation
------------------------------------------------

The fact is that numbers are numbers and if a CPU is suffering in a particular
test the proper thing to do - where possible - is find out WHY, not to discard
the test that causes it and say that you will explicitly avoid using it as a
performance metric. By doing so you have deliberately chosen to exclude from
your reports a clear indication that our benchmark is hitting on something
that is significantly different between the processors under test - an
indication that you, it seems, do not like to see or have in your own reports,
despite the fact that this is one of the very things that reviews are supposed
to find.

Given that the source to POV is available it certainly would have been
possible for someone such as yourself to perform more in-depth analysis of
this issue, had you chosen to do so.

Your methodology
----------------

We also take issue with the fact that you choose to not disclose either the
files you use as input or the settings you use to run our software. It is
generally considered a basic tenet of benchmarking that such tests are
reproducible; if you choose not to provide the means for anyone to reproduce
your tests, how can you expect anyone to trust your results ? It has certainly
made it a lot harder for us to comprehensively respond to your article.

Summary
-------

It's clear to us that benchmark.pov didn't give you the results you liked.
However we suggest that instead of hinting at unsubstantiated conspiracy theories
based on clear misunderstandings and poorly-documented tests you just get over
it and either accept the numbers or investigate the cause of them. If you
don't want to use our benchmark, fine, but how dare you suggest that we are
somehow dishonest enough to tweak our code to suit one vendor over another
when running the benchmark file?

------------------------------------------------------------------------------
             Posted by Chris Cason on behalf of the POV-Team
------------------------------------------------------------------------------
Post a reply to this message