|
![](/i/fill.gif) |
gk wrote:
> Micha Riser wrote:
>> I have made icc binaries, you can get them from
>> http://www.povworld.org/povray/binaries.html
>>
>>Pentium4 optimized
>>icc, no sse2
>>povray.p4.nosse2.bz2 (1538779 bytes)
>
> SSE2 seems to be the most powerful hardware feature
> for speed optimisation with P4,
> so why is it off? Does it run faster without
> this option?
I've tried it.. but it got slower. I don't *exactly* know why but there are
some things to consider:
- while you can use SSE2 for double precision, you can only do 2 operations
on double at once with it
- this means e.g. for the 3D-vector calculations: you save max. 1 out of 3
operation
- memory misalignment: If the memory is not 128-bit aligned it can get slow
(don't know the details here)
Conlusion: The loops that occure with POV-Ray are probably too small to gain
anything from SSE/SSE2 (there may be some exceptions) or the icc doesn't a
well enough job at generationg code for it (actually I think it was icc 6.0
when I tried it) and you would need to optimize it "by hand" (as Intel has
done for the windows binary with the noise calculations (which I failed to
"port" to Intel, that is I used Intel's special source code for the linux
binary but it got slower..)). Also I did not have a pentium4 to experiment
myself at that time. Mabye I'll give it a try once again.
- Micha
--
POV-Ray Objects Collection: http://objects.povworld.org
Post a reply to this message
|
![](/i/fill.gif) |