|
 |
Invisible wrote:
> Chambers wrote:
>
>> 1) Support for sophisticated branching
>
> When this happens, the GPU will be exactly the same speed as the CPU.
> The GPU is fast *because* it doesn't support sophisticated branching.
That's too bad, because POV requires sophisticated branching.
>> 2) Full double-precision accuracy
>
> This already exists apparently. (E.g., my GPU supports double-precision
> math.)
Yes, but there are still relatively few cards in consumer machines that
fully support double precision.
>> 3) Large memory sets (other than textures)
>
> My GPU has access to just under 1GB of RAM. How much do you want?
Last I checked, each shader was limited to accessing a very small amount
(I think it was less than 1MB, though this may no longer be accurate for
current top of the line cards) of shared memory, and then had access to
textures.
If you could figure out a way to store your data as a texture (ie, an
array of data) then you didn't have a problem. Of course, textures
aren't designed to hold distinct values in them (like a plain array), so
you have to tell the card to disable all those optimizations like
blending & filtering... you know, all the things that were designed
thinking that textures were actually images.
>> 4) Independent shaders running on distinct units.
>
> What exactly do you mean by that?
POV often needs to find the intersection of a single ray with a single
object.
GPUs still function by calling blocks of shaders with the same program
(this is how they get their speed; even though each individual shader is
relatively slow, the whole block together is considered fast), and very
similar data.
Now, POV could hold onto pending intersection tests until there are
enough to fill a buffer... but the data wouldn't be distributed the way
that GPUs want it.
That is, POV would still have a group of independent intersection tests,
each one with different parameters.
GPUs work by saying, "Run this shader, with the first parameter
interpolated between these two values, and the second parameter
interpolated between these two other values, and the third parameter
interpolated between yet another set of values..."
The random data access of POV would render that unworkable.
Of course, I fully admit to not having read the specs for OpenCL, only
CUDA, so I can't say how much more useful it is. However, given that I
believe these to be hardware limitations rather than software, I'd be
surprised if OpenCL is really that much more powerful (though I've heard
it's easier to work with).
...Chambers
Post a reply to this message
|
 |