POV-Ray: Newsgroups: povray.programming: CUDA - NVIDIA's massively parallel programming architecture: Re: CUDA - NVIDIA's massively parallel programming architecture

POV-Ray : Newsgroups : povray.programming : CUDA - NVIDIA's massively parallel programming architecture : Re: CUDA - NVIDIA's massively parallel programming architecture		Server Time 17 May 2024 04:41:46 EDT (-0400)

From: Patrick Elliott
Date: 20 Apr 2007 16:22:30
Message: <MPG.2092cebe2e3c94f598a003@news.povray.org>

In article <web.4625ceddefc6bb7494cf37ec0@news.povray.org>, 
_the### [at] yahoocom says...
> Ben Chambers <ben### [at] pacificwebguycom> wrote:
> > PaulSh wrote:
> > > POV-Ray is finally making its transition to SMP systems, which is gre
at news
> > > for those of us who can afford them. However, in terms of raw CPU pow
er the
> > > latest NVIDIA graphics cards would seem to completely blow away anyth
ing in
> > > the way of SMP solutions this side of a research lab. Their CUDA GPU
> > > architecture is claimed to allow up to 128 independent processing uni
ts
> > > each running at 1.35GHz to be thrown at computationally-intensive pro
blems.
> > > So, my first thought was not SETI or protein folding, but POV-Ray. Gi
ven
> > > that V3.7 is going to be fully threaded, what would be the possibilit
y of a
> > > CUDA version? I guess that will depend on the time and abilities of s
omeone
> > > with a lot more time and a lot more ability than myself...
> > >
> > >
> >
> > No good, they're only single precision.  Plus, each shader unit would
> > need access to the entire scene file, which would be a pain in the a**
> > to code.
> >
> > ...Chambers
> 
> According to the CUDA programming guide published by NVidia the GPGPU
> architecture is as competent as any 32 bit processor.  More precise
> computations can be simulated using multiple registers for a computation
> instead of a single register if my memory serves - so I seriously doubt
> this is a serious obstacle.
> 
> The device code is written in a simple extension of C, with host code eit
her
> written to match or working through the device drivers directly from any
> language capable of such.
> 
> The major difficulty involved would be preparing the pov-ray source to ru
n
> efficiently on a SIMD architecture - native code may run out of the box
> through the provided compiler, but the results would be poor at best
> without optimization to take advantage of the particular memory hierarchi
es
> involved.
> 
> Sharing parse trees used in the ray tracing is actually highly efficient 
on
> this architecture - it would be stored in memory read-only from the devic
e
> accessible from any thread (without locking).  The host processor is
> typically responsible for loading the parse tree, since the its preparati
on
> is not likely to be efficient in parallel.
> 
> The biggest hurdle I am aware of is load-balancing threads to make effici
ent
> use of the processing power available - simple subdividing the image into
> unrelated render-blocks is obviously bounded by the worst-case running ti
me
> of the entire image, which may be unacceptably slow for any sufficiently
> complex scene.  Perhaps someone with more in-depth knowledge of the
> algorithms can determine what the limiting factors of per-ray threading
> would be or other techniques.  (This may or may not have been addressed f
or
> a multi-core implementation - since at worse for the naive implementation
> you still have 1/n CPU utilization for a n-core system.)
> 
> Justin
> 
> see:
> http://developer.nvidia.com/object/cuda.html#documentation
> 
Umm. Sorry but: A) trying to use more registers to do the processing 
adds more overhead to the process than using them does in the first 
place, I think, and that presumes you can even do it effectively. B) 
Most cards are not 32-bit internally, or at least do not provide 32-bit 
for every register or process they perform. C) POVRay is often now 
compiled for 64-bit, so... and D) Its hardly trivial to write code that 
uses multi-core processing, when dealing with things that are often 
quite linear in implementation. A fact that *still* results in Radiosity 
and some other features being in the "not yet working" mode in the new 
multi-core CPU version of POVRay already being developed. Adding in a 
graphics card that would only be directly compatible with the 32-bit 
compile of it just adds more headaches.

And that is my fairly inexpert opinion. I suspect that the detailed 
explanation would go well past what I said and be more specific as to 
why it just won't work currently. In fact, I am certain of it, since you 
are like the third person to bring it up in the last year, so the 
explanation for why it won't work is someplace in the archives already.

-- 
void main () {

    call functional_code()
  else
    call crash_windows();
}

<A HREF='http://www.daz3d.com/index.php?refid=16130551'>Get 3D Models,
 
3D Content, and 3D Software at DAZ3D!</A>

Post a reply to this message