POV-Ray : Newsgroups : povray.unix : Compile for given CPU : Re: Compile for given CPU Server Time
3 Jul 2024 05:18:58 EDT (-0400)
  Re: Compile for given CPU  
From: mf
Date: 15 Jun 2004 17:19:35
Message: <40cf67e7@news.povray.org>
>         I was told recently that using profiling-guided optimization could
> also somewhat increase performance on Athlon's.  To do so, you have to
> compile an intermediate binary using "-fprofile-arcs", then run a
> (representative) set of povray scenes, and recompile again this time
> with "-fbranch-probabilities" instead of the option above.
>         Personnaly I had some oddities doing so with gcc-3.4.0 on the
> current POV-Ray 3.6 sources, and the speedup was also negligible.

You have the 3.6 sources??

About profiling, the gcc team say that using profiling may speed up the gcc
compiler by about 7% (the profiled compiled compiler is 7% faster :-)
Personally I never got any conclusive evidence that profiled binaries work
faster, but probably it depends on the apllication and on the relevance of
gathered data in the intermediate phase.

Anyway, here are my optimize flags for gcc:

-O3 -fno-exceptions -fno-rtti -fno-check-new -fomit-frame-pointer 
(does -O3 implies any of them?)

-funroll-loops (if this slows down depends on the available cache)

-march=pentium4 -mfpmath=sse -msse -msse2 -mmmx

-ffast-math -fno-math-errno -funsafe-math-optimizations -fno-trapping-math 
(may be unsafe, certainly not IEEE fp compatibles)

-foptimize-sibling-calls -malign-double -minline-all-stringops

All in all, I am not sure I get better results with any combination of them
than with the plain old -O3 alone.

Using the Intel C Compiler (icc free linux version for non-commercial use)
does not gain anything. using 'icc -O3 -Ob2 -mcpu=pentiumpro -xK
-march=pentiumiii -tpp6 -unroll -ipo -pch -fno-rtti -align' I obtain the
same rendering times as with the gcc binary...

My opinion is that -O3 is the safe bet, more options may gain 3% speed but
to what purpose? 8 or 8.5 seconds is the same for a short rendering, and 72 
hours is the same with 73 hours for a long rendering.

But between -O2/-O and -O3 there are important speed differences.

-- 
-----------------------------------------------------------------------------
Replay-to address is fake. Please replace it 
with mferecat (at) numericable (dot) fr.
-----------------------------------------------------------------------------


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.