|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I have compiled and installed povray (Version 3.6.1, g++ 4.1.1 @
x86_64-unknown-linux-gnu) on a Opteron 880 2.4GHz box. Configure used the
optimizations -O3 -msse -mfpmath=sse -msse2 -march-k8 -mtune=k8
-maligh-double -minline-all-stringops. A scene file I have takes 45 minutes
running on a single core. This same scene file takes 20 minutes under
Windows XP (standard binary distribution) on a Pentium M 1.86GHz laptop.
I'm assuming I need to build this differently on the AMD box. Any pointers?
Post a reply to this message
|
|
| |
| |
|
|
From: Florian Brucker
Subject: Re: Performance Confusion with AMD Opteron 880
Date: 6 Nov 2006 18:00:51
Message: <454fbea3$1@news.povray.org>
|
|
|
| |
| |
|
|
Hi Steve!
> Any pointers?
Nicolas Calimet has written a pretty detailed article which compares
different compiler options for POV-Ray on different hardware
configurations running linux. There's an Opteron in there, too, although
it's a 246, not a 800. But perhaps the informations in the text apply to
your case, too. You can find the article here:
http://pov4grasp.free.fr/articles/fastpov1/
HTH,
Florian
Post a reply to this message
|
|
| |
| |
|
|
From: Nicolas Calimet
Subject: Re: Performance Confusion with AMD Opteron 880
Date: 6 Nov 2006 20:02:43
Message: <454fdb33@news.povray.org>
|
|
|
| |
| |
|
|
> A scene file I have takes 45 minutes
> running on a single core. This same scene file takes 20 minutes under
> Windows XP (standard binary distribution) on a Pentium M 1.86GHz laptop.
Your observation is indeed quite surprising, especially given that
this Opteron processor should be faster than the Pentium-M. Without your
scene at hand, at the moment I have no idea why it performs less than twice
slower compared to the other binary / OS / machine specs.
However here are some thoughts:
- try running the official benchmark with your binary, and compare the result
with those on the page linked by Florian. I suppose you should get a CPU time
in the order of 1200-1250 seconds (64-bit binary, single core). Otherwise,
it's likely that something is wrong with your binary, maybe a problem with
the 64-bit environment. You might also run the benchmark using the official
32-bit Linux binary just to figure out if your system is running stable.
Running the benchmark with the Pentium-M on Windows might also be helpful
to get an idea of what is the expected speed ratio between the two machines.
- in case your k8 machine passes the test(s) above, try running your scene
with the official 32-bit Linux binary. In principle you should run even
more slower. Otherwise, there might be something very specific in your scene
(some POV feature that is not used in the official benchmark) that reveals
an optimization problem in your binary (e.g. with SSE2 usage) or instability
in the system (e.g. again a possible 64-bit specific problem, a buggy math
or C library).
- try running with 3.7.0.beta.16.linux-x86-64. This binary was prepared using
the same compiler as yours and similar optimization flags. Due to the many
changes in 3.7, the binary might render your scene faster on a single core.
Of course, in such a case you might consider running on as many cores as are
available on this machine :-)
- NC
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas Calimet <pov### [at] freefr> wrote:
> - try running the official benchmark with your binary, and compare the result
> with those on the page linked by Florian. I suppose you should get a CPU time
> in the order of 1200-1250 seconds (64-bit binary, single core).
I had seen the Florian article before. I've just gotten through running
benchmark.pov and the results are
1265 sec on the Opteron
1969 sec on the Pentium M laptop
These are comparable to the results in the Florian article (fudging for
different processor speeds) so I'm now convinced that there is not
something badly wrong with the build. Concerned that I might be imagining
things I ran my scene file again. The results are
2727 sec on the Opteron
907 sec on the Pentium M
This difference is actually a little worse then I reported above because I
had inadvertantly made a change to the scene file on the laptop that made
it run a little slower. So clearly there is something in my scene file. The
file is large (53MB) but consists of 240K smooth triangles in a mesh, a
little CSG, two superellipsoids and a lot of transparency.
> try running your scene
> with the official 32-bit Linux binary. In principle you should run even
> more slower. Otherwise, there might be something very specific in your scene
> (some POV feature that is not used in the official benchmark) that reveals
> an optimization problem in your binary (e.g. with SSE2 usage) or instability
> in the system (e.g. again a possible 64-bit specific problem, a buggy math
> or C library).
>
> - try running with 3.7.0.beta.16.linux-x86-64. This binary was prepared using
> the same compiler as yours and similar optimization flags. Due to the many
> changes in 3.7, the binary might render your scene faster on a single core.
> Of course, in such a case you might consider running on as many cores as are
> available on this machine :-)
>
>
> - NC
I will try the official binary and the 3.7 beta next.
- steve
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas, my apologies, it is your article, linked to by Florian, not
Florian's article. It is bad to type faster then you think :)
- steve
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Well, the culprit seems to be the superellipsoid. I one-by-one removed
things from the scene file always seeing a factor of 3 difference. Replace
the superellipsoid by other things and the Opteron is faster in the same
proporation as was seen in benchmark.pov. Rendered at 800x600 with no AA
the example below gives times of 7 sec on the Opteron and 2 sec on the
laptop.
- steve
#include "colors.inc"
camera {
location <460, 15, 0>
look_at <0,0,0>
sky <0,0,1>
}
light_source { <0,1000,1000> color White }
superellipsoid { <1,0.1> scale <120,120,120>
texture { pigment { color Blue } finish { diffuse 0.9 phong 1 } }
}
Post a reply to this message
|
|
| |
| |
|
|
From: Nicolas Calimet
Subject: Re: Performance Confusion with AMD Opteron 880
Date: 10 Nov 2006 16:50:57
Message: <4554f441@news.povray.org>
|
|
|
| |
| |
|
|
> Rendered at 800x600 with no AA
> the example below gives times of 7 sec on the Opteron and 2 sec on the
> laptop.
OK, I can confirm this slowdown with a binary prepared using gcc-4.1.1
in 64-bit mode (comparisons made with 32-bit povray on k7 and p4 machines).
This looks like a gcc problem when generating 64-bit code, as I can get
expected render times using the same compiler optimizations AND the -m32 gcc
flag to produce a 32-bit binary. Furthermore, it's likely this problem exists
only in the gcc-4.1.x series, since older versions (namely 3.4.2, 3.4.3, 4.0.0)
produce 64-bit binaries rendering at the expected speed. But that needs more
investigation before one files a bug report to the gcc devs.
- NC
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |