POV-Ray: Newsgroups: povray.programming: SIMD implementation of dot-product in POV-Ray???: Re: SIMD implementation of dot-product in POV-Ray???

POV-Ray : Newsgroups : povray.programming : SIMD implementation of dot-product in POV-Ray??? : Re: SIMD implementation of dot-product in POV-Ray???		Server Time 3 Jul 2025 04:26:14 EDT (-0400)

From: Thorsten Froehlich
Date: 25 Nov 1999 10:44:30
Message: <383d595e@news.povray.org>

In article <383D4B2A.B376001E@tidax.se> , Goran Begicevic <gor### [at] tidaxse>
wrote:

> Hey, anybody considered using SIMD instructions embeeded in
> new-generations of processors??
> As far as i know, MMX is worthless, but there are some neat SIMD
> features in new P-III processors that could help.

This issue comes up every month or so, serach a bit back through the
newsgroups and you will find your question answered.

> Now, one of things that costs time to compute is dot-product. And dot
> product is something that is being used *a lot* in raytracing , to say
> at least.

I doubt that just improving the dot product will speed things up in any
noticeable range at all.

> As far as i remember from peering into POV-Ray's source code, it's using
> "double" floating-point numbers. That's something like ~90 bits of
> precision.

By default double uses 64 bits on x86. And there are good reason to have
this precision.

> As soon as i get some time , i'll try to convert POV-Ray dot-product
> algorithm to SIMD and take a look at the results.

This is taken from the AMD 3DNow SDK matrix (thus it is AMDs SIMD FPU
extension, not Intels), but for this purpose it will be enough:

ALIGN   32
PUBLIC  _a_dot_vect
_a_dot_vect PROC
        movq        mm0,[eax]
        movq        mm3,[edx]
        movd        mm1,[eax+8]
        movd        mm2,[edx+8]
        pfmul       mm0,mm3
        pfmul       mm1,mm2
        pfacc       mm0,mm0
        pfadd       mm0,mm1
        ret
_a_dot_vect ENDP

As you can see, making this change is rather trivial.  The problems you will
need two versions of POV-Ray, one for AMDs extension and for Intels. Besides
that, in order to use single precision, you will likely have to change the
definition of DBL in the POV-Ray source from double to float. Be aware that
this is not as simple as it might seem...

> It's hard to say how
> it will look like , but my guess is that we don't need additional
> precision of "double" variable too often.

You do.  Define DBL as float and watch POV-Ray "hang" in several functions
because of the missing precision.

     Thorsten

Post a reply to this message