POV-Ray: Newsgroups: povray.programming: CUDA - NVIDIA's massively parallel programming architecture: Re: CUDA - NVIDIA's massively parallel programming architecture

POV-Ray : Newsgroups : povray.programming : CUDA - NVIDIA's massively parallel programming architecture : Re: CUDA - NVIDIA's massively parallel programming architecture		Server Time 14 Jul 2025 17:19:04 EDT (-0400)

From: theCardinal
Date: 21 Apr 2007 21:35:01
Message: <web.462abb54efc6bb7494cf37ec0@news.povray.org>

Chambers <ben### [at] pacificwebguycom> wrote:
> _theCardinal wrote:
> > Ben Chambers <ben### [at] pacificwebguycom> wrote:
> >> No good, they're only single precision.  Plus, each shader unit would
> >> need access to the entire scene file, which would be a pain in the a**
> >> to code.
> >>
> >> ...Chambers
> >
> > According to the CUDA programming guide published by NVidia the GPGPU
> > architecture is as competent as any 32 bit processor.  More precise
> > computations can be simulated using multiple registers for a computation
> > instead of a single register if my memory serves - so I seriously doubt
> > this is a serious obstacle.
>
> But double precision is actually 64bit.  Until recently (I don't
> remember exactly which model), NVidia didn't even do full 32bit FP (that
> is, single precision), but only 24.  POV-Ray, for the last dozen years,
> has done 64bit FP (double precision), as the extra accuracy is necessary
> for the types of computations it does.
>
> Sure, you can simulate it in the same way you can use two integer units
> to simulate a fixed point number, but the result is slow.  Perhaps if
> Intel surprises everyone, and releases their next graphics chip as a
> double precision FP monsters, we'd be able to take advantage of that,
> but the current ATI / NVidia cards aren't up to the task of dealing with
> POV-Ray.
>
> > The major difficulty involved would be preparing the pov-ray source to run
> > efficiently on a SIMD architecture - native code may run out of the box
> > through the provided compiler, but the results would be poor at best
> > without optimization to take advantage of the particular memory hierarchies
> > involved.
>
> Once the 3.7 source is out, it should be much easier, as the major task
> is simply fitting it to a parallel paradigm.  Having already done that,
> porting to different parallel architectures should be trivial (relative
> to the original threading support, that is).
>
> --
> ...Ben Chambers
> www.pacificwebguy.com

Few things:

"But double precision is actually 64bit."
To be technical the number of bits used for a double is implementation
dependent.  The requirement is simply that a float <= double.  It is up to
compiler to decide how to interpret that.  Using double in lieu of float
simply indicates the desire for additional precision - not the requirement
(in C and C++).  Hence it is impossible in general to say povray is using
64 bits.  See: The C++ Programming Language (TCPL) 74-75.

Compilers may have more than a few techniques to simulate 64 bit computation
on a 32 bit architecture, but I am not experienced enough in compiler design
to state them within reasonable doubt.  Its worth noting that the time lost
in doing 2 ops instead of 1 is easily regained in shifting from 1-2
processors to an array of processors, so this is not a concern provided the
utilization of the array is sufficiently high.

CUDA is a new beast - designed for 8800 or later generation cards by NVidia.
 This means that the vast majority of cards in use today do not support it -
and probably won't for the next 2-3 years.  According to the specification I
read the registers are full 32 bit, not 24 as in earlier cards.  For more
detailed information google CUDA and browse the documentation provided
along with the SDK.

I would like to clarify that a dual-processor design is not likely to share
much in common with a SIMD (single instruction multiple data) architecture.
 In particular - reaching optimal utilization when sets of processors are
required to share the same instruction set is not terrible easy, this
requirement doesn't exist at all for a dual-core architecture (and is one
of the reasons it moved into consumer systems).  General ray-tracing does
show great potential though, since its intuitively rare to have an image
where the number of rays to evaluate varies rapidly over the image.
(Refracting through a 'bag of marbles' would be a good counterexample
though - a small shift in direction would wildly vary in the number of rays
to computer.  I doubt there is any tractable deterministic method to check
if this is the case though)

Personally I am less concerned with the usefulness of the implementation
that in its experimental value.

Thanks,

Justin

Post a reply to this message