POV-Ray : Newsgroups : povray.unix : 3.50c released (see caveats) Server Time
6 Oct 2024 13:38:53 EDT (-0400)
  3.50c released (see caveats) (Message 25 to 34 of 34)  
<<< Previous 10 Messages Goto Initial 10 Messages
From: Micha Riser
Subject: Re: 3.50c released (see caveats)
Date: 31 Oct 2002 05:05:36
Message: <3dc10070@news.povray.org>
Steve wrote:

> On Wed, 30 Oct 2002 21:07:23 +0100, Felix Wiemann wrote:
>> Steve wrote:
> 
> You'll no bout be very surprised to find that the Athlon optimized binary
> works perfectly on the PII.
> 

I have explicitly written on the download page that the athlon binary - 
while being optimized for athlons - is compiled to work an all 
pentium-compatible machines.

- Micha

-- 
objects.povworld.org - The POV-Ray Objects Collection
book.povworld.org    - The POV-Ray Book Project


Post a reply to this message

From: Christopher James Huff
Subject: Re: 3.50c released (see caveats)
Date: 1 Nov 2002 10:41:23
Message: <chrishuff-FB03BE.10335801112002@netplex.aussie.org>
In article <3dc0f069$1@news.povray.org>,
 "Thorsten Froehlich" <tho### [at] trfde> wrote:

> Yes, but as the Pentium II does not support the SSE instructions it is
> surprising that it works at all!  Apparently the compiler used doesn't add a
> lot of SSE instructions...

Right, but it does work, Steve was saying it worked faster than the 
original 3.5. The post I replied to only made sense if it didn't work. 
"Why are you surprised that it doesn't work?"

-- 
Christopher James Huff <cja### [at] earthlinknet>
http://home.earthlink.net/~cjameshuff/
POV-Ray TAG: chr### [at] tagpovrayorg
http://tag.povray.org/


Post a reply to this message

From: bstone
Subject: Re: 3.50c released (see caveats)
Date: 2 Nov 2002 15:37:26
Message: <3dc43786$1@news.povray.org>
about the sse comment
gcc3.2 (and 3.1) work with sse/sse2, but there are still a ton of x87 calls
based on the way the code is laid out.
unfortunately based on my testing the windows binary is still at least 10%
faster than the best binary I was able to make with gcc 3.2.
I think the code needs to be restructured to get a big speed boost.


"Micha Riser" <mri### [at] gmxnet> wrote in message
news:3dc0ffc1@news.povray.org...
Thorsten Froehlich wrote:

> In article <chr### [at] netplexaussieorg> ,
> Christopher James Huff <chr### [at] maccom>  wrote:
>
>>> You are running a compile for PIII on a PII? Why are you surprised that
>>> it doesn't work?
>>
>> Read it again...the PIII optimized compile is the faster one on his
>> machine.
>
> Yes, but as the Pentium II does not support the SSE instructions it is
> surprising that it works at all!  Apparently the compiler used doesn't add
> a lot of SSE instructions...

gcc 3.1 does not use the SSE instruction by itself, you would have to use
inline assembler to make it use them. So there is porbably only a few
PIII-specific instructions left that it uses at all. This explains why it
sometimes works on a PII as well. I had even turned off the SSE
optimization when I compiled with the Intel compiler (only some of the
5-components colour calculations could have been vectorized anyways)
because it did not result in a speed-up, but rather a slow-down.

- Micha

--
objects.povworld.org - The POV-Ray Objects Collection
book.povworld.org    - The POV-Ray Book Project


Post a reply to this message

From: Roz
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 15:01:36
Message: <3DC580BB.7080306@netscape.net>
bstone wrote:
> about the sse comment
> gcc3.2 (and 3.1) work with sse/sse2, but there are still a ton of x87 calls
> based on the way the code is laid out.
> unfortunately based on my testing the windows binary is still at least 10%
> faster than the best binary I was able to make with gcc 3.2.
> I think the code needs to be restructured to get a big speed boost.

There must be some stuff in the Windows official binary that takes advantage
of the Pentium specifically. Either that or gcc 3.1 likes Athlons better :P

On my Athlon XP 1900+ I get consistently faster renders with my custom 
Athlon
build of POV-Ray 3.5 vs the Windows official binary. The difference is very
small and nothing to get excited about but it certainly isn't 10% slower.
Here's a couple of render times from some tests I did this morning:

benchmark.pov
-------------
Windows official binary   = 26m 24s (1584s)
Athlon targeted gcc build = 25m  7s (1507s)

balcony.pov
-----------
Windows official binary   = 8m 14s (494s)
Athlon targeted gcc build = 7m  3s (423s)

The compiler flags used to make the Athlon targeted build using gcc 3.1 
were:

-O3 -s -mcpu=athlon -march=athlon -finline-functions -ffast-math \
-fomit-frame-pointer -Wall -funroll-loops -fexpensive-optimizations \
-malign-double -foptimize-sibling-calls -minline-all-stringops 
$(NOMULTICHAR)

I don't know if this sheds any light on anything and I'm certainly no
expert! I just found it kind of interesting.

-Roz


Post a reply to this message

From: Warp
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 16:47:56
Message: <3dc5998c@news.povray.org>
Roz <Rzl### [at] netscapenet> wrote:
> There must be some stuff in the Windows official binary that takes advantage
> of the Pentium specifically.

  I think that it's simply that Intel's compiler can compile better for
Intel processors than gcc.

-- 
#macro M(A,N,D,L)plane{-z,-9pigment{mandel L*9translate N color_map{[0rgb x]
[1rgb 9]}scale<D,D*3D>*1e3}rotate y*A*8}#end M(-3<1.206434.28623>70,7)M(
-1<.7438.1795>1,20)M(1<.77595.13699>30,20)M(3<.75923.07145>80,99)// - Warp -


Post a reply to this message

From: Roz
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 17:33:42
Message: <3DC5A463.3090603@netscape.net>
Warp wrote:
> Roz <Rzl### [at] netscapenet> wrote:
> 
>>There must be some stuff in the Windows official binary that takes advantage
>>of the Pentium specifically.
> 
> 
>   I think that it's simply that Intel's compiler can compile better for
> Intel processors than gcc.
> 

Yes, that makes sense because they (Intel) should know all the ins and 
outs of
their own processor ;)  Looking back through old posts to this newsgroup 
there's
some indication that icc custom builds are just a tiny bit faster than gcc
builds. They're all so close to each other now that I'm not worried 
about it.
I just didn't want people to get the impression that the 10% slower thing
bstone mentioned was universal. There's many factors involved beyond the
compiler used including type of processor.

-Roz


Post a reply to this message

From: Safari
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 18:21:11
Message: <slrnasbbr3.2b8.y7pt9001@safari.homelinux.net>
On Sun, 03 Nov 2002 14:34:11 -0800, Roz <Rzl### [at] netscapenet> wrote:
> Warp wrote:
>> Roz <Rzl### [at] netscapenet> wrote:
>> 
>>>There must be some stuff in the Windows official binary that takes advantage
>>>of the Pentium specifically.
>> 
>> 
>>   I think that it's simply that Intel's compiler can compile better for
>> Intel processors than gcc.
>> 
> 
> Yes, that makes sense because they (Intel) should know all the ins and 
> outs of
> their own processor ;)  Looking back through old posts to this newsgroup 
> there's
> some indication that icc custom builds are just a tiny bit faster than gcc

that tiny bit turns out to be 15-45% back here in Finland.
ICC 6.0.1 on Linux[1] versus Official POV-Ray 3.50c:

skyvase.pov
ICC: 5943374424 cycles
GCC: 8737719077 cycles

brilliant.pov (many diamonds, max trace level 109)
ICC:  7264656795 cycles
GCC: 13371101675 cycles

fish13.pov
ICC: 15667900293 cycles
GCC: 19866506320 cycles

was profiling (-prof_genx & -prof_use) used in the Windows version?
it can speed up by 25%.
option -pc80 seemed to speed up at least some scenes by 8%.
also, remember to use -limf instead of -lm.
-wp_ipo and -rcd seemed to break the compile at least with ICC 6.0.0,
haven't tried with 6.0.1.

BTW, who was optimizing usage of some variable in POV-Ray?
it involved making some variable a global variable, sqrt and
optics.pov.  I'd like to see the patch.

> builds. They're all so close to each other now that I'm not worried 
> about it.
> I just didn't want people to get the impression that the 10% slower thing
> bstone mentioned was universal. There's many factors involved beyond the
> compiler used including type of processor.

like what scene files were used when generating profiling data
with binary generated by icc -prof_genx ... ;)
 
> -Roz

[1] includes some patches/hacks not in the Official version,
    including using SHA-1 as PRNG.

-- 
Safari - y7p### [at] sneakemailcom
"Talk is cheap. Show me the code." - Linus Torvalds


Post a reply to this message

From: Roz
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 19:27:55
Message: <3DC5BF27.5080804@netscape.net>
Safari wrote:
> On Sun, 03 Nov 2002 14:34:11 -0800, Roz <Rzl### [at] netscapenet> wrote:
>>Yes, that makes sense because they (Intel) should know all the ins and 
>>outs of
>>their own processor ;)  Looking back through old posts to this newsgroup 
>>there's
>>some indication that icc custom builds are just a tiny bit faster than gcc
> 
> 
> that tiny bit turns out to be 15-45% back here in Finland.
> ICC 6.0.1 on Linux[1] versus Official POV-Ray 3.50c:

Well now you've done it. After reading earlier posts I figured I could
skip the huge download of ICC (if it's still available for download). Now
you've tempted me into clobbering my poor modem connection ;)

Was this the official GCC compile of POV-Ray 3.50c for Linux you were
comparing speeds on or a more optimized one you compiled yourself?
The GCC compile I did is still a good bit faster than the official one.
Mark Gordon can't put too many special optimizations in the official
binary or he'd sacrifice compatibility.

My main thinking is that if I can make a compile using GCC that will
be very close in speed to whatever ICC can produce, then it's not
worth the hassle of downloading ICC and making a compile with it.

> was profiling (-prof_genx & -prof_use) used in the Windows version?
> it can speed up by 25%.
> option -pc80 seemed to speed up at least some scenes by 8%.
> also, remember to use -limf instead of -lm.
> -wp_ipo and -rcd seemed to break the compile at least with ICC 6.0.0,
> haven't tried with 6.0.1.

I've never tried the Intel compiler. Do you have a list of all the options
you recommend to use? I'll have to do some studying in between large
work related projects. *sigh*

>>builds. They're all so close to each other now that I'm not worried 
>>about it.
>>I just didn't want people to get the impression that the 10% slower thing
>>bstone mentioned was universal. There's many factors involved beyond the
>>compiler used including type of processor.
> 
> 
> like what scene files were used when generating profiling data
> with binary generated by icc -prof_genx ... ;)
>  

Exactly! Well I don't know the details of the option you're describing but
there was much left out of that 10% slower comment and I couldn't
just let it sit there without a response.

-Roz


Post a reply to this message

From: Safari
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 19:51:54
Message: <slrnasbh5b.2b8.y7pt9001@safari.homelinux.net>
On Sun, 03 Nov 2002 16:28:23 -0800, Roz <Rzl### [at] netscapenet> wrote:
> Safari wrote:
>> On Sun, 03 Nov 2002 14:34:11 -0800, Roz <Rzl### [at] netscapenet> wrote:
>>>Yes, that makes sense because they (Intel) should know all the ins and 
>>>outs of
>>>their own processor ;)  Looking back through old posts to this newsgroup 
>>>there's
>>>some indication that icc custom builds are just a tiny bit faster than gcc
>> 
>> 
>> that tiny bit turns out to be 15-45% back here in Finland.
>> ICC 6.0.1 on Linux[1] versus Official POV-Ray 3.50c:
                                ^^^^^^^^^^^^^^^^^^^^^^
> 
> Well now you've done it. After reading earlier posts I figured I could
> skip the huge download of ICC (if it's still available for download). Now
> you've tempted me into clobbering my poor modem connection ;)

work, library, school + CD-RW,... ;)
it contains also IA-64 -versions, take that into account if you
can extract the tarball on some *NIX shell etc. and download only IA-32
stuff with your modem...
 
> Was this the official GCC compile of POV-Ray 3.50c for Linux you were

it was Official POV-Ray 3.50c as stated a couple of lines above.

> comparing speeds on or a more optimized one you compiled yourself?
> The GCC compile I did is still a good bit faster than the official one.
> Mark Gordon can't put too many special optimizations in the official

he can put any optimizations as long as with the same input the
binary produces the same output ;)

> binary or he'd sacrifice compatibility.
> 
> My main thinking is that if I can make a compile using GCC that will
> be very close in speed to whatever ICC can produce, then it's not

not with GCC v3.2 I fear, but you can try.

> worth the hassle of downloading ICC and making a compile with it.
> 
>> was profiling (-prof_genx & -prof_use) used in the Windows version?
>> it can speed up by 25%.
>> option -pc80 seemed to speed up at least some scenes by 8%.
>> also, remember to use -limf instead of -lm.
>> -wp_ipo and -rcd seemed to break the compile at least with ICC 6.0.0,
>> haven't tried with 6.0.1.
> 
> I've never tried the Intel compiler. Do you have a list of all the options
> you recommend to use? I'll have to do some studying in between large
> work related projects. *sigh*

-O3 -tpp6 -xi -restrict -align -ipo -ipo_obj -unroll -pc80
(re-experiments for options -wp_ipo and -rcd scheduled for far future, 
IIRC, -rcd did not make any difference, YMMV).
remember to use also -prof_dir /tmp/some/dir.
and don't bother with -xM -xK -xW ...

for me -tpp6 is faster because I have Celeron, P4 users might want
to try also -tpp7...  but sometimes -tpp7 produces faster binaries
for me (but not with POV-Ray).
I don't know which option is better for Athlon.

>>>builds. They're all so close to each other now that I'm not worried 
>>>about it.
>>>I just didn't want people to get the impression that the 10% slower thing
>>>bstone mentioned was universal. There's many factors involved beyond the
>>>compiler used including type of processor.
>> 
>> 
>> like what scene files were used when generating profiling data
>> with binary generated by icc -prof_genx ... ;)
>>  
> 
> Exactly! Well I don't know the details of the option you're describing but

it makes ICC generate statistics about branches etc done while rendering,
with the profiling data it can generate faster code.  I didn't find info
what is taken into account when profiling, surely also other things are
done besides branches...

for example, if the scene files do not make use of radiosity feature,
profiling data for radiosity code in POV-Ray can't be generated and
radiosity-code's speed in the final executable generated by ICC with
-prof_use will be sub-optimal...  so render many different scene files
with the binary compiled with -prof_genx.
I hope you understood something from that.

> there was much left out of that 10% slower comment and I couldn't
> just let it sit there without a response.
> 
> -Roz

-- 
Safari - y7p### [at] sneakemailcom
"Talk is cheap. Show me the code." - Linus Torvalds


Post a reply to this message

From: Roz
Subject: Re: 3.50c released (see caveats)
Date: 3 Nov 2002 20:18:23
Message: <3DC5CAFC.9020804@netscape.net>
Safari wrote:
> work, library, school + CD-RW,... ;)
> it contains also IA-64 -versions, take that into account if you
> can extract the tarball on some *NIX shell etc. and download only IA-32
> stuff with your modem...

Good to know and I think I can take advantage of that.

>>My main thinking is that if I can make a compile using GCC that will
>>be very close in speed to whatever ICC can produce, then it's not
> 
> 
> not with GCC v3.2 I fear, but you can try.

Actually I was trying to find out if *you* tried rather than just compare
it to the official binary. It didn't seem fair to GCC to compare a
specially optimized ICC compile against a more generically optimized
GCC compile. If I can make a GCC compile that's faster than the official
binary I'm sure you can too and that's probably what should be compared
against the ICC compile you've done. Basically try to compare both with
all the optimizations you can throw at it. I'll give ICC a go when I
get the chance. It'll be interesting to see how it'll fare on an Athlon.

> -O3 -tpp6 -xi -restrict -align -ipo -ipo_obj -unroll -pc80
> (re-experiments for options -wp_ipo and -rcd scheduled for far future, 
> IIRC, -rcd did not make any difference, YMMV).
> remember to use also -prof_dir /tmp/some/dir.
> and don't bother with -xM -xK -xW ...
> 
> for me -tpp6 is faster because I have Celeron, P4 users might want
> to try also -tpp7...  but sometimes -tpp7 produces faster binaries
> for me (but not with POV-Ray).
> I don't know which option is better for Athlon.

This information is awesome, thanks!

> it makes ICC generate statistics about branches etc done while rendering,
> with the profiling data it can generate faster code.  I didn't find info
> what is taken into account when profiling, surely also other things are
> done besides branches...
> 
> for example, if the scene files do not make use of radiosity feature,
> profiling data for radiosity code in POV-Ray can't be generated and
> radiosity-code's speed in the final executable generated by ICC with
> -prof_use will be sub-optimal...  so render many different scene files
> with the binary compiled with -prof_genx.
> I hope you understood something from that.

Yes, actually you are explaining that well :)

-Roz


Post a reply to this message

<<< Previous 10 Messages Goto Initial 10 Messages

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.