|
|
|
|
|
|
| |
| |
|
|
From: Goran Begicevic
Subject: SIMD implementation of dot-product in POV-Ray???
Date: 25 Nov 1999 09:44:21
Message: <383D4B2A.B376001E@tidax.se>
|
|
|
| |
| |
|
|
Hey, anybody considered using SIMD instructions embeeded in
new-generations of processors??
As far as i know, MMX is worthless, but there are some neat SIMD
features in new P-III processors that could help.
Now, one of things that costs time to compute is dot-product. And dot
product is something that is being used *a lot* in raytracing , to say
at least.
As far as i remember from peering into POV-Ray's source code, it's using
"double" floating-point numbers. That's something like ~90 bits of
precision.
If we could get by using ordinary 32-bit "float" precision, we could
take advantage of SIMD-capabilities of P-III processor and speed up
dot-product calculations substantially.
Now did anyone tryed this trick before??
As soon as i get some time , i'll try to convert POV-Ray dot-product
algorithm to SIMD and take a look at the results. It's hard to say how
it will look like , but my guess is that we don't need additional
precision of "double" variable too often.
Any input is welcome!
Post a reply to this message
|
|
| |
| |
|
|
From: mr art
Subject: Re: SIMD implementation of dot-product in POV-Ray???
Date: 25 Nov 1999 10:30:21
Message: <383D55F4.82979A9D@gci.net>
|
|
|
| |
| |
|
|
Wouldn't that make the program processor dependent? I thought that
the teams wanted their work to be portable.
Goran Begicevic wrote:
>
> As far as i know, MMX is worthless, but there are some neat SIMD
> features in new P-III processors that could help.
>
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: SIMD implementation of dot-product in POV-Ray???
Date: 25 Nov 1999 10:44:30
Message: <383d595e@news.povray.org>
|
|
|
| |
| |
|
|
In article <383D4B2A.B376001E@tidax.se> , Goran Begicevic <gor### [at] tidaxse>
wrote:
> Hey, anybody considered using SIMD instructions embeeded in
> new-generations of processors??
> As far as i know, MMX is worthless, but there are some neat SIMD
> features in new P-III processors that could help.
This issue comes up every month or so, serach a bit back through the
newsgroups and you will find your question answered.
> Now, one of things that costs time to compute is dot-product. And dot
> product is something that is being used *a lot* in raytracing , to say
> at least.
I doubt that just improving the dot product will speed things up in any
noticeable range at all.
> As far as i remember from peering into POV-Ray's source code, it's using
> "double" floating-point numbers. That's something like ~90 bits of
> precision.
By default double uses 64 bits on x86. And there are good reason to have
this precision.
> As soon as i get some time , i'll try to convert POV-Ray dot-product
> algorithm to SIMD and take a look at the results.
This is taken from the AMD 3DNow SDK matrix (thus it is AMDs SIMD FPU
extension, not Intels), but for this purpose it will be enough:
ALIGN 32
PUBLIC _a_dot_vect
_a_dot_vect PROC
movq mm0,[eax]
movq mm3,[edx]
movd mm1,[eax+8]
movd mm2,[edx+8]
pfmul mm0,mm3
pfmul mm1,mm2
pfacc mm0,mm0
pfadd mm0,mm1
ret
_a_dot_vect ENDP
As you can see, making this change is rather trivial. The problems you will
need two versions of POV-Ray, one for AMDs extension and for Intels. Besides
that, in order to use single precision, you will likely have to change the
definition of DBL in the POV-Ray source from double to float. Be aware that
this is not as simple as it might seem...
> It's hard to say how
> it will look like , but my guess is that we don't need additional
> precision of "double" variable too often.
You do. Define DBL as float and watch POV-Ray "hang" in several functions
because of the missing precision.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"mr.art" wrote:
>
> Wouldn't that make the program processor dependent? I thought that
> the teams wanted their work to be portable.
>
There are so many different compiles, why bother about another one?
I think it is an advantage of POV to make it suit your requirements.
;-}
Post a reply to this message
|
|
| |
| |
|
|
From: Goran Begicevic
Subject: Re: SIMD implementation of dot-product in POV-Ray???
Date: 27 Nov 1999 07:06:07
Message: <383FC8FF.5C5AC270@tidax.se>
|
|
|
| |
| |
|
|
> Wouldn't that make the program processor dependent? I thought that
> the teams wanted their work to be portable.
Who cares as long as it's faster. Most of new POV-patches are processor
dependent anyway. It should be a really 'political' decision not to
speed it up , just beacuse it might put Intel into spotlight.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
In article <383FC8FF.5C5AC270@tidax.se>, Goran Begicevic
<gor### [at] tidaxse> wrote:
> > Wouldn't that make the program processor dependent? I thought that
> > the teams wanted their work to be portable.
>
> Who cares as long as it's faster.
A lot of people, actually. And the precision loss would create other
problems.
> Most of new POV-patches are processor dependent anyway.
Which ones? I can only think of the #system patch, which is actually OS
dependant, not processor dependant. I don't think there has been one
processor dependant patch(except maybe PVMPOV).
--
Chris Huff
e-mail: chr### [at] yahoocom
Web page: http://chrishuff.dhs.org/
Post a reply to this message
|
|
| |
| |
|
|
From: Goran Begicevic
Subject: Re: SIMD implementation of dot-product in POV-Ray???
Date: 27 Nov 1999 07:16:28
Message: <383FCB6A.B6ABEC10@tidax.se>
|
|
|
| |
| |
|
|
>
> This issue comes up every month or so, serach a bit back through the
> newsgroups and you will find your question answered.
the conclusion on this issue in older threads?n
> I doubt that just improving the dot product will speed things up in any
> noticeable range at all.
Well, run POV in profiler and take a look where it's spending most of
it's time.
>
> By default double uses 64 bits on x86. And there are good reason to have
> this precision.
Yes, i'm sorry , i mixed it with 'long double'. It was a long time since
i programmed.
> This is taken from the AMD 3DNow SDK matrix (thus it is AMDs SIMD FPU
> extension, not Intels), but for this purpose it will be enough:
>
> ALIGN 32
> PUBLIC _a_dot_vect
> _a_dot_vect PROC
> movq mm0,[eax]
> movq mm3,[edx]
> movd mm1,[eax+8]
> movd mm2,[edx+8]
> pfmul mm0,mm3
> pfmul mm1,mm2
> pfacc mm0,mm0
> pfadd mm0,mm1
> ret
> _a_dot_vect ENDP
Neat. Thanx. Unfortunately, i don't own AMD processor. I'll try to get
one of those Athlons tough.
Now, i'm not so assembler-skilled. How wide is mm0,1,2,3 register? Is
this done on 32-bit 'float' variables?
As far as i heard, Intels implementation of dot-product is even more
'automated' so you don't need to multiply registers 'by hand'. It's all
being done in one command.
> As you can see, making this change is rather trivial. The problems you will
> need two versions of POV-Ray, one for AMDs extension and for Intels.
Ahh...smallest problem.
> You do. Define DBL as float and watch POV-Ray "hang" in several functions
> because of the missing precision.
Note that this is not my idea of how this should be done. I would keep
all calculations as they are, and just rewrite dot-product funtion.
'double' would be converted into float prior to calculations and then
converted back.
Well, we'll never know if we never try, right?
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: SIMD implementation of dot-product in POV-Ray???
Date: 27 Nov 1999 11:24:50
Message: <384005d2@news.povray.org>
|
|
|
| |
| |
|
|
In article <383FCB6A.B6ABEC10@tidax.se> , Goran Begicevic <gor### [at] tidaxse>
wrote:
>>
>> This issue comes up every month or so, serach a bit back through the
>> newsgroups and you will find your question answered.
>
> the conclusion on this issue in older threads?n
In short, that the precision is not good enough. In addition, improving high
level algorithms usually gives a more significant speedup without having to
use assembler.
>> I doubt that just improving the dot product will speed things up in any
>> noticeable range at all.
>
> Well, run POV in profiler and take a look where it's spending most of
> it's time.
Hmm, did you ever do that? A profiler will show you in which functions the
time is spend, but all vector operations in POV-Ray are macros.
Whenever I profiled, I found that POV-Ray spends a lot of time doing memory
allocations...
> Now, i'm not so assembler-skilled. How wide is mm0,1,2,3 register? Is
> this done on 32-bit 'float' variables?
Yes, all the SIMD FPU instructions are on 32 bit floats, there are no 64 bit
float SIMD instructions.
> As far as i heard, Intels implementation of dot-product is even more
> 'automated' so you don't need to multiply registers 'by hand'. It's all
> being done in one command.
I am not very familiar with x86 assembler.
>> You do. Define DBL as float and watch POV-Ray "hang" in several functions
>> because of the missing precision.
>
> Note that this is not my idea of how this should be done. I would keep
> all calculations as they are, and just rewrite dot-product funtion.
>
> 'double' would be converted into float prior to calculations and then
> converted back.
I am not sure if you can easily move data from the SISD FPU to the SIMD FPU
registers, that might take up more time than the actual SISD calculation.
> Well, we'll never know if we never try, right?
Well, of course there is nothing from keeping you to try it. Just don't be
to disappointed if you don't see any speedup.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Chris Huff wrote:
>
> Which ones? I can only think of the #system patch, which is actually OS
> dependant, not processor dependant. I don't think there has been one
> processor dependant patch(except maybe PVMPOV).
I'm pretty sure PVMPOV is more OS-dependent (needs some flavor of Unix)
than CPU-dependent. I've only used it on x96-Linux myself, but I'd be
surprised if it didn't work on SPARC-Solaris, for instance, assuming you
can get PVM for SPARC-Solaris.
-Mark Gordon
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Mark Gordon <mtg### [at] mailbagcom> writes:
> Chris Huff wrote:
> >
> > Which ones? I can only think of the #system patch, which is actually OS
> > dependant, not processor dependant. I don't think there has been one
> > processor dependant patch(except maybe PVMPOV).
>
> I'm pretty sure PVMPOV is more OS-dependent (needs some flavor of Unix)
> than CPU-dependent. I've only used it on x96-Linux myself, but I'd be
> surprised if it didn't work on SPARC-Solaris, for instance, assuming you
> can get PVM for SPARC-Solaris.
You can get it from ftp://netlib2.cs.utk.edu/pvm3/ and it works with PVMPOV.
(I tried it myself some time ago.)
Thomas
--
http://thomas.willhalm.de/ (includes pgp key)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|