POV-Ray: Newsgroups: povray.programming: SIMD implementation of dot-product in POV-Ray???: Re: SIMD implementation of dot-product in POV-Ray???

POV-Ray : Newsgroups : povray.programming : SIMD implementation of dot-product in POV-Ray??? : Re: SIMD implementation of dot-product in POV-Ray???		Server Time 12 Jul 2025 23:28:43 EDT (-0400)

From: Goran Begicevic
Date: 27 Nov 1999 07:16:28
Message: <383FCB6A.B6ABEC10@tidax.se>

> 
> This issue comes up every month or so, serach a bit back through the
> newsgroups and you will find your question answered.


the conclusion on this issue in older threads?n 

> I doubt that just improving the dot product will speed things up in any
> noticeable range at all.

Well, run POV in profiler and take a look where it's spending most of
it's time.
 
> 
> By default double uses 64 bits on x86. And there are good reason to have
> this precision.

Yes, i'm sorry , i mixed it with 'long double'. It was a long time since
i programmed.
 
> This is taken from the AMD 3DNow SDK matrix (thus it is AMDs SIMD FPU
> extension, not Intels), but for this purpose it will be enough:
> 
> ALIGN   32
> PUBLIC  _a_dot_vect
> _a_dot_vect PROC
>         movq        mm0,[eax]
>         movq        mm3,[edx]
>         movd        mm1,[eax+8]
>         movd        mm2,[edx+8]
>         pfmul       mm0,mm3
>         pfmul       mm1,mm2
>         pfacc       mm0,mm0
>         pfadd       mm0,mm1
>         ret
> _a_dot_vect ENDP

Neat. Thanx. Unfortunately, i don't own AMD processor. I'll try to get
one of those Athlons tough. 

Now, i'm not so assembler-skilled. How wide is mm0,1,2,3 register? Is
this done on 32-bit 'float' variables? 

As far as i heard, Intels implementation of dot-product is even more
'automated' so you don't need to multiply registers 'by hand'. It's all
being done in one command. 

> As you can see, making this change is rather trivial.  The problems you will
> need two versions of POV-Ray, one for AMDs extension and for Intels. 
Ahh...smallest problem.

> You do.  Define DBL as float and watch POV-Ray "hang" in several functions
> because of the missing precision.

Note that this is not my idea of how this should be done. I would keep
all calculations as they are, and just rewrite dot-product funtion. 

'double' would be converted into float prior to calculations and then
converted back.

Well, we'll never know if we never try, right?

Post a reply to this message