|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> The test with povray compiled without optimisations on my pc has just
> ended : 45 minutes 13.0 seconds (2713 seconds)
>
> 3 seconds longer : this is not signifiant.
So if I understand you correctly, you do obtain:
gcc 3.2.2 -march=i586 : 45 minutes 13 seconds ( 2713 seconds )
gcc 3.2.2 -march=pentium4 : 36 minutes 43 seconds ( 2203 seconds )
icc 8.0 -tpp7 : 33 minutes 01 seconds ( 1981 seconds )
Is this right ?
I'd be interested if you could post the full gcc/icc command-line
that you actually used to compile the binaries :-)
- NC
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
In article <40b250b5$1@news.povray.org> , "Ross" <rli### [at] everestkcnet>
wrote:
> They might have no validity from a true benchmarking perspective. However
> they illustrate that compiling the source with a recent compiler can improve
> performance when compared to using the official binary distribution. While
> it is helpfull to understand where exactly the performance gain is comming
> from, to me that is secondary to the fact that there is a performance gain.
This is not the point. The point is that you cannot take some old version
of a compiler that was set to optimise for one processor and compare it with
a newer version of the same compiler set to optimise for another processor
and conclude anything valid about the effect of the optimisations. It is
like taking a 1949 Porsche 356 with a top speed of about 140 km/h and
compare it to a 2004 Golf with a top speed of about 160 km/h (smallest
engine) and concluding that a Golf is faster than a Porsche.
As such, only using the same compiler version can possibly be used to
compare optimisation efficiency. So, looking at the example, and taking any
2004 Porsche, one will quickly find all models have a top speed of over 200
km/h. So, now only one variable is left (the car type) and as such the
comparison is valid.
Thorsten
____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde
Visit POV-Ray on the web: http://mac.povray.org
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas Calimet wrote:
>> The test with povray compiled without optimisations on my pc has just
>> ended : 45 minutes 13.0 seconds (2713 seconds)
>>
>> 3 seconds longer : this is not signifiant.
>
>
> So if I understand you correctly, you do obtain:
>
> gcc 3.2.2 -march=i586 : 45 minutes 13 seconds ( 2713 seconds )
> gcc 3.2.2 -march=pentium4 : 36 minutes 43 seconds ( 2203 seconds )
> icc 8.0 -tpp7 : 33 minutes 01 seconds ( 1981 seconds )
>
> Is this right ?
>
> I'd be interested if you could post the full gcc/icc command-line
> that you actually used to compile the binaries :-)
>
> - NC
>
The GCC CFLAGS variable was the one used in the original one with only
'i586' replaced by 'pentium4'
the icc CFLAGS variable was :
CFLAGS = -O3 -mcpu=pentium4 -march=pentium4 -ip
(-tpp7 is like -mcpu=pentium4, -march=pentium4 could be replaced by -xP)
-ip adds Interprocedural Optimizations (IPO)
more optimisations could be realized with Profile-guided Optimizations
(PGO) and static linking
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas Calimet <pov### [at] freefr> wrote:
> > #define DISTRIBUTION_MESSAGE_1 "This is an unofficial version compiled by:"
> > Jong
> > #define DISTRIBUTION_MESSAGE_2 "FILL IN NAME HERE...."
> > #define DISTRIBUTION_MESSAGE_3 "The POV-Ray Team(tm) is not responsible for
> > supporting this version."
>
> Given what you did above, I'd suggest you do not try compiling povray,
> but rather use the official Linux binary that should work on your RedHat distro.
>
> - NC
Yes. I had use the official Linux binary and it works pretty good on RedHat.
But as I try to install parallel patch as describe by
http://news.povray.org/povray.binaries.programming/thread/%3C3E40469B.3080402@web.de%3E/
The configuration needs Makefile.am in all POVRAY folder which can only be
found in POVRAY source code.
So, that's why I try to compiling povray source now.
Maybe I have to ask for the binary of parapov...
Is there have another way to configure parapov without compiling the source
code?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
gRRosminet wrote:
>
> I have downloaded the official binary for linux and ran the benchmark.
>
> Official : 45 minutes 08 seconds ( 2708 seconds )
> gcc pentium4 optimized : 36 minutes 43 seconds ( 2203 seconds )
> ICC pentium4 optimized : 33 minutes 01 seconds ( 1981 seconds )
>
> ratios (official / optimized) :
> Official : 1
> gcc pentium4 optimized : 1.229
> icc pentium4 optimized : 1.367
My test results look quite different:
official 3.5: 2394s
3.6 RC1 (gcc 3.4): 2292s
3.6, optimized* (gcc 3.4): 2274s
*) -march=athlon-xp -mfpmath=sse -mmmx -msse -m3dnow
And before someone asks: profiling based optimization also does not
change much about this (i tested this some time ago).
The most likely explanation for your results to me seems the pentium 4
being particularly bad at running code not specifically optimized for
its design.
Christoph
--
POV-Ray tutorials, include files, Sim-POV,
HCR-Edit and more: http://www.tu-bs.de/~y0013390/
Last updated 01 May. 2004 _____./\/^>_*_<^\/\.______
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
In article <c8uv2j$rvd$1@chho.imagico.de> , Christoph Hormann
<chr### [at] gmxde> wrote:
> official 3.5: 2394s
> 3.6 RC1 (gcc 3.4): 2292s
> 3.6, optimized* (gcc 3.4): 2274s
>
> *) -march=athlon-xp -mfpmath=sse -mmmx -msse -m3dnow
>
> And before someone asks: profiling based optimization also does not
> change much about this (i tested this some time ago).
>
> The most likely explanation for your results to me seems the pentium 4
> being particularly bad at running code not specifically optimized for
> its design.
Note: You cannot compare 3.5 and 3.6 benchmark results as anti-aliasing
changed a bit and a photons change makes those much faster. As such, only
the two 3.6 results you posted can be compared.
Thorsten
____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde
Visit POV-Ray on the web: http://mac.povray.org
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Christoph Hormann wrote:
> gRRosminet wrote:
>
>>
>> I have downloaded the official binary for linux and ran the benchmark.
>>
>> Official : 45 minutes 08 seconds ( 2708 seconds )
>> gcc pentium4 optimized : 36 minutes 43 seconds ( 2203 seconds )
>> ICC pentium4 optimized : 33 minutes 01 seconds ( 1981 seconds )
>>
>> ratios (official / optimized) :
>> Official : 1
>> gcc pentium4 optimized : 1.229
>> icc pentium4 optimized : 1.367
>
>
> My test results look quite different:
>
> official 3.5: 2394s
> 3.6 RC1 (gcc 3.4): 2292s
> 3.6, optimized* (gcc 3.4): 2274s
>
> *) -march=athlon-xp -mfpmath=sse -mmmx -msse -m3dnow
>
> And before someone asks: profiling based optimization also does not
> change much about this (i tested this some time ago).
>
> The most likely explanation for your results to me seems the pentium 4
> being particularly bad at running code not specifically optimized for
> its design.
>
> Christoph
>
Could you try it with an optimized version of 3.5 ?
The first time I tried to compile an optimized version of povray, it was
on an Athlon Thunderbird and I get very good results on a personal scene
(wich didn't use so many functionnalities as the benchmark one)
for my pentium 4, an other explanation could be that the sse2
instruction are really better on floating point than sse.
In your case, you limit yourself at using SSE instruction whereas Athlon
processors have an excelent fpu unit. you should replace -mfpmath=sse by
-mfpmath=sse,387
Here is the GCC man page about this option :
sse,387
Cela double effectivement le nombre de registres disponibles et
Thierry
P.S. : please don't tell me that P4 is a bad processor : I know it but I
don't want to hear about it ! ;-P
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thorsten Froehlich wrote:
>
> Note: You cannot compare 3.5 and 3.6 benchmark results as anti-aliasing
> changed a bit and a photons change makes those much faster. As such, only
> the two 3.6 results you posted can be compared.
That obviously depends on the purpose of your comparison. If you want
to know if your scenes render faster with the new version the comparison
of 3.5 and 3.6 benchmark results does make sense. Of course you will
have to keep in mind that the speed difference might mostly be due to
the use of photons.
Christoph
--
POV-Ray tutorials, include files, Sim-POV,
HCR-Edit and more: http://www.tu-bs.de/~y0013390/
Last updated 01 May. 2004 _____./\/^>_*_<^\/\.______
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
gRRosminet wrote:
>
> Could you try it with an optimized version of 3.5 ?
No time for that at the moment but i don't see any reason why processor
specific optimization should work so much better there.
> for my pentium 4, an other explanation could be that the sse2
> instruction are really better on floating point than sse.
> In your case, you limit yourself at using SSE instruction whereas Athlon
> processors have an excelent fpu unit. you should replace -mfpmath=sse by
> -mfpmath=sse,387
Does not change that much: 2197s
Christoph
--
POV-Ray tutorials, include files, Sim-POV,
HCR-Edit and more: http://www.tu-bs.de/~y0013390/
Last updated 01 May. 2004 _____./\/^>_*_<^\/\.______
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> sse,387
> Cela double effectivement le nombre de registres disponibles et
>
And from the english man page:
sse,387
Attempt to utilize both instruction sets at once. This effec-
tively double the amount of available registers and on chips
with separate execution units for 387 and SSE the execution
resources too. Use this option with care, as it is still
experimental, because the gcc register allocator does not model
separate functional units well.
:-)
BTW, I realized that my statement about povray not being able
to use SSE/SSE2/similar instructions seems plain wrong, as I thought
those were (still) limited to single-precision arithmetics.
My bad.
So it's seems they could be actually quite valuable, and
I will consider them in the new configure script for 3.6. Thanks
for pointing this out.
- NC
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |