|
|
Out of curiousity I wanted to see exactly how the different optimization
flags in gcc affect the speed of povray (3.6.1).
Because I didn't want to wait for 30-60 minutes for the standard benchmark,
I just used scenes/advanced/abyss.pov instead.
Here are the results:
System: Pentium4 3.4GHz, 1GB RAM, Suse Linux 9.3, gcc 3.3.5
Rendering options: abyss.pov -w800 -h300 +a +am2 -f -p -x +d
Compiler optimization options (compilation time, stripped binary size):
* No optimization (55 secs, 1 645 644):
Total Time: 0 hours 11 minutes 40 seconds (700 seconds)
* -Os (1 min 8 secs, 1 368 780):
Total Time: 0 hours 7 minutes 31 seconds (451 seconds)
* -O1 (1 min 2 secs, 1 436 716):
Total Time: 0 hours 7 minutes 8 seconds (428 seconds)
* -O2 (1 min 14 secs, 1 482 892):
Total Time: 0 hours 6 minutes 48 seconds (408 seconds)
* -O3 (1 min 22 secs, 1 665 612):
Total Time: 0 hours 6 minutes 43 seconds (403 seconds)
* -O3 -march=pentium4 (1 min 24 secs, 1 613 452):
Total Time: 0 hours 6 minutes 2 seconds (362 seconds)
* -O3 -march=pentium4 -ffast-math (1 min 23 secs, 1 577 068):
Total Time: 0 hours 5 minutes 24 seconds (324 seconds)
* -O3 -march=pentium4 -ffast-math -malign-double (1 min 22 secs, 1 576 780):
Total Time: 0 hours 5 minutes 25 seconds (325 seconds)
* -O3 -march=pentium4 -ffast-math -mfpmath=sse -msse2
(1 min 26 secs, 1 661 452):
Total Time: 0 hours 5 minutes 13 seconds (313 seconds)
* -O3 -march=pentium4 -ffast-math -mfpmath=sse -msse2 -minline-all-stringops
(1 min 23 secs, 1 662 284):
Total Time: 0 hours 5 minutes 18 seconds (318 seconds)
So the winner, in a pentium4, seems to be:
-O3 -march=pentium4 -ffast-math -mfpmath=sse -msse2
I noticed that the configure script of unix-pov3.6.1 did not add
the "-ffast-math" option to the Makefiles. This is worthy of notice.
Note also how -minline-all-stringops (which the configure script adds)
actually *slows* down the rendering a tiny bit.
--
- Warp
Post a reply to this message
|
|
|
|
The results are interesting, especially the inline-string-ops thing.
Perhaps you find this useful:
http://www.coyotegulch.com/products/acovea/
The author of the page uses a genetic algorithm to automatically find the
best set of compiler-flags for some benchmarks. It would be interesting
(though overkill probably) to run this genetic algorithm with a povray
benchmark script. But even if not the 'Acovea 5.0, GCC 4.0, Opteron' and
'Acovea 5.0, GCC 4.0, Pentium 4' Sections might give you some ideas which
other possibly useful compilerflags you could test.
I'm curious how much room for improvement there still is.
Thies
Post a reply to this message
|
|
|
|
> Out of curiousity I wanted to see exactly how the different optimization
> flags in gcc affect the speed of povray (3.6.1).
Thanks, that is right in line with what I've been doing for the past
14 months now :-)
> Because I didn't want to wait for 30-60 minutes for the standard benchmark,
> I just used scenes/advanced/abyss.pov instead.
I also used this scene (for the fastpov2 study I hope to publish before
the end of the year) though with different settings. Seems to get something
consistent with your results.
> * -O3 -march=pentium4 -ffast-math -mfpmath=sse -msse2
> (1 min 26 secs, 1 661 452):
> Total Time: 0 hours 5 minutes 13 seconds (313 seconds)
Would be nice for completeness if you could also report results for
a binary without -msse2 (likely no difference with the above).
> I noticed that the configure script of unix-pov3.6.1 did not add
> the "-ffast-math" option to the Makefiles. This is worthy of notice.
Yes, it has been added for 3.6.2 quite some time ago already.
> Note also how -minline-all-stringops (which the configure script adds)
> actually *slows* down the rendering a tiny bit.
This is interesting. Seems also in agreement with what has been
observed on the AMD64 recently (Christoph). Looks like I have to try
this out too, though I don't really get why this can possibly make such
a difference...
- NC
Post a reply to this message
|
|