POV-Ray: Newsgroups: povray.windows: Intel SSE2 noise optimizations in POV 3.6

POV-Ray : Newsgroups : povray.windows : Intel SSE2 noise optimizations in POV 3.6		Server Time 17 May 2024 05:37:33 EDT (-0400)

From: Micha Riser
Subject: Intel SSE2 noise optimizations in POV 3.6
Date: 28 Apr 2006 18:35:00
Message: <web.4452986df9a1193f95cba54c0@news.povray.org>

The windows source code of POV-3.5 came with a file called intelsse2.h which
provided SSE2 optimized noise computation for fasternoise.h. I noticed that
version 3.6 source code no longer provides these files. I wonder why...

Does this mean that the official windows binaries are not built using sse2
optimized noise files?

Sincerely,
- Micha

Post a reply to this message

From: Nicolas Calimet
Subject: Re: Intel SSE2 noise optimizations in POV 3.6
Date: 29 Apr 2006 09:29:44
Message: <44536a48$1@news.povray.org>

> The windows source code of POV-3.5 came with a file called intelsse2.h which
> provided SSE2 optimized noise computation for fasternoise.h. I noticed that
> version 3.6 source code no longer provides these files. I wonder why...

	AFAIK at the time 3.5 was released the available compilers were not
able to automatically generate SSE2-optimized code for that particular peace
of code, and Intel contributed the hand-optimized code you are referring to.
Nowadays the situation is much different since, in particular, the Intel C++
compiler greatly improved its optimization framework (the GCC compiler did
also improve a lot, though it is still not as good as ICPC at optimizing on
the P4 architecture; K8 might be slightly different though).  Therefore,
the 3.6 codebase didn't need this hand-optimization any longer.

> Does this mean that the official windows binaries are not built using sse2
> optimized noise files?

	I'm not sure for the latest 3.6.1c Windows binary -- but if it is
indeed optimized for SSE2-capable CPUs, not only the noise code will benefit
from the optimizations.  At least the 3.7 beta offer a fully optimized SSE2
build as well as an non-SSE2 optimized binary.

	Chris Cason might give you a more precise answer about the matter.

	- NC

Post a reply to this message

From: Micha Riser
Subject: Re: Intel SSE2 noise optimizations in POV 3.6
Date: 29 Apr 2006 10:25:01
Message: <web.445376b5866a85d795cba54c0@news.povray.org>

Nicolas Calimet <pov### [at] freefr> wrote:
>  AFAIK at the time 3.5 was released the available compilers were not
> able to automatically generate SSE2-optimized code for that particular peace
> of code, and Intel contributed the hand-optimized code you are referring to.
> Nowadays the situation is much different since, in particular, the Intel C++
> compiler greatly improved its optimization framework (the GCC compiler did
> also improve a lot, though it is still not as good as ICPC at optimizing on
> the P4 architecture; K8 might be slightly different though).  Therefore,
> the 3.6 codebase didn't need this hand-optimization any longer.

That is not true. I have recently studied the sse support of icc and gcc in
detail. It will not do that kind of optimization (performing the noise
calculation with SIMD parallel instructions) with the unmodified POV-Ray
noise code. It still needs much care to the code in order for the compiler
to do automatic optimization.

What the compilers currently do well is using the see2 registers (xmm)
instead of the floating point stack to do the floating point computations.
But when you look at the assembler you will see that all comuptations are
not done in parallel.

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Intel SSE2 noise optimizations in POV 3.6
Date: 29 Apr 2006 10:49:53
Message: <44537d11$1@news.povray.org>

Micha Riser wrote:
  >> AFAIK at the time 3.5 was released the available compilers were not
>>able to automatically generate SSE2-optimized code for that particular peace
>>of code, and Intel contributed the hand-optimized code you are referring to.
>>Nowadays the situation is much different since, in particular, the Intel C++
>>compiler greatly improved its optimization framework (the GCC compiler did
>>also improve a lot, though it is still not as good as ICPC at optimizing on
>>the P4 architecture; K8 might be slightly different though).  Therefore,
>>the 3.6 codebase didn't need this hand-optimization any longer.
> 
> That is not true. I have recently studied the sse support of icc and gcc in
> detail. It will not do that kind of optimization (performing the noise
> calculation with SIMD parallel instructions) with the unmodified POV-Ray
> noise code. It still needs much care to the code in order for the compiler
> to do automatic optimization.

Nicolas never said the compiler would turn out the same noise code as the 
hand-optimised version. You constructed this assumption on your own. It does 
not change the fact that the compiler optimization for P4 are a whole lot 
better than they were many years ago, which is the relevant point. What 
comilers do to be faster is not relevant for that Nicolas' statement to be true!

	Thorsten

Post a reply to this message

From: Micha Riser
Subject: Re: Intel SSE2 noise optimizations in POV 3.6
Date: 29 Apr 2006 15:40:01
Message: <web.4453bfb5866a85d795cba54c0@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> Nicolas never said the compiler would turn out the same noise code as the
> hand-optimised version. You constructed this assumption on your own. It does
> not change the fact that the compiler optimization for P4 are a whole lot
> better than they were many years ago, which is the relevant point. What
> comilers do to be faster is not relevant for that Nicolas' statement to be true!

Nicolas concluded:

> >> Therefore,
> >>the 3.6 codebase didn't need this hand-optimization any longer.

I assume that we want the binaries as fast as possible, so why should the
hand-optimizations no longer be used if they are still faster?

Another thing that comes to my mind: The noise calculation is done in double
precision. If it were done in single precision only it could be better
parallized with SIMD. Do you know if this has been tried yet? Is it know
that single precision is not precise enough?

- Micha

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Intel SSE2 noise optimizations in POV 3.6
Date: 29 Apr 2006 15:57:17
Message: <4453c51d$1@news.povray.org>

Micha Riser wrote:
  > Another thing that comes to my mind: The noise calculation is done in double
> precision. If it were done in single precision only it could be better
> parallized with SIMD. Do you know if this has been tried yet? Is it know
> that single precision is not precise enough?

Oh, the code of course continues to work. But the differences are very visible.

	Thorsten

Post a reply to this message

From: Nicolas Calimet
Subject: Re: Intel SSE2 noise optimizations in POV 3.6
Date: 3 May 2006 10:03:17
Message: <4458b825$1@news.povray.org>

> Nicolas concluded:
> 
>>>> Therefore,
>>>> the 3.6 codebase didn't need this hand-optimization any longer.
> 
> I assume that we want the binaries as fast as possible, so why should the
> hand-optimizations no longer be used if they are still faster?

	The whole point here exactly stands in your very last few words: are
those hand-optimizations *still* faster than the compiler-optimized code
on current processors?  According to our tests with both POV-Ray 3.6 for
Windows and for Linux -- they are not.
	To the contrary these hand-optimizations nowadays tend to slow down
the program execution.  For instance you can end up with at least a 25%
speed loss in rendering the official benchmark (that uses Noise quite a bit)
on current Pentium 4 processors, as compared to compiler-optimized binaries.
Most likely this disappointing result has to do with the micro-architecture
of those processors, in particular lengthy pipelines (think P4 Northwood /
Prescott and Pentium D in particular) that probably voids the benefit of
using SIMD parallel instructions.  However I'm not qualified enough to argue
on that matter...

	- NC

PS: Note that I didn't try the hand-optimizations on the AMD K8, but I don't
expect them to give anything better than on the P4; I couldn't test on a
Pentium-M either.  Also the latest architectures that Intel is currently
(Intel Core) or will soon (Conroe and Merom) be introducing might not even
change the deal, since the current and future compilers will most likely
produce much better output for them.

Post a reply to this message