|
![](/i/fill.gif) |
Micha Riser wrote:
> gk wrote:
>
>
>>Micha Riser wrote:
>>
>>>I have made icc binaries, you can get them from
>>>http://www.povworld.org/povray/binaries.html
>>>
>>>Pentium4 optimized
>>>icc, no sse2
>>>povray.p4.nosse2.bz2 (1538779 bytes)
>>
>>SSE2 seems to be the most powerful hardware feature
>>for speed optimisation with P4,
>>so why is it off? Does it run faster without
>>this option?
>
>
> I've tried it.. but it got slower. I don't *exactly* know why but there are
> some things to consider:
> - while you can use SSE2 for double precision, you can only do 2 operations
> on double at once with it
> - this means e.g. for the 3D-vector calculations: you save max. 1 out of 3
> operation
> - memory misalignment: If the memory is not 128-bit aligned it can get slow
> (don't know the details here)
>
> Conlusion: The loops that occure with POV-Ray are probably too small to gain
> anything from SSE/SSE2 (there may be some exceptions) or the icc doesn't a
> well enough job at generationg code for it (actually I think it was icc 6.0
> when I tried it) and you would need to optimize it "by hand" (as Intel has
> done for the windows binary with the noise calculations (which I failed to
> "port" to Intel, that is I used Intel's special source code for the linux
> binary but it got slower..)). Also I did not have a pentium4 to experiment
> myself at that time. Mabye I'll give it a try once again.
>
> - Micha
>
I see. Thank you for this binary,
it run really faster than gcc version.
My quick test shows that it is even
a bit faster than standard icl under W2k.
I'll try to get more tests soon(hope).
Gleb
Post a reply to this message
|
![](/i/fill.gif) |