I have put 3 new binaries on my webpage.
- a pentium compatible, optimized for athlon
- a pentium3 optimized (I also used profiling information this time)
- a pentium3 optimized with intel compiler
A 320x240 render of benchmark.pov on my pentium3 showed following results:
old binary: 21m32s
pentium binary: 21m30s
pentium3 gcc binary: 21m06s
pentium3 icc binary: 17m25s
Without the use of profiling information the icc binary where about equally
fast as the gcc ones, though depending on the scene. But it seems that the
intel compiler made great use of the profiling information.
The binaries are compiled with '-s' what made them smaller. If you compile
them yourself I advice you to remove the '-static' flag.
I have also tried to modify the colour operations (which are done in single
precision) so that sse simd instruction can be used for that. The media
code seemed particularly useful for that. Icc also reported many loops as
vectorized after my modifications. But unfortunately the speed-up was about
null. This can have several reasons: the way that povray allocates memory
will make the stack boundary non-nice aligned / the colour calculations
take only a fraction of the total time / I missed some way to make good use
of the simd instructions (the code DID use the instructions for sure).
On pentium4 / athlon 64-bit sse2 instructions could be used. It needs a
change of vector.h though to make icc use them. No idea if this would help
though.. but if any pentium4 owner wants to try he can write to me.
No guarantee that my binaries work stable and produce correct results.
Happy ray-tracing.
- Micha
--
http://objects.povworld.org - the POV-Ray Objects Collection
Post a reply to this message
|