|
 |
Chambers wrote:
> Of course, modern GPUs now allow double precision, so we can get to the
> other objections now. Specifically:
>
> 1) Recursion. As clipka (Christian?) wrote, it is absolutely essential
> for POV.
>
> 2) Data parallelization versus code parallelization (this is related to
> the first, but is not strictly the same).
>
> The ray tracing algorithm follows drastically different code branches on
> a single set of data, based on recursion (reflections & refractions), as
> well as the other various computations needed (texture calculation,
> light source occlusion, etc) which almost all need access to the entire
> scene.
There are two problems: recursion and divergence.
When a ray hits something, zero or more secondary rays are spawned. On
the CPU, this is usually just a recursive function call, but the GPU
does not permit such a thing.
Also, a GPU consists of *hundreds* of cores, but they must all execute
the same code path (but with different data). You can set the GPU up to
process multiple rays, but as soon as some of the rays hit object A but
others hit object B, the code paths that need to be taken diverge from
each other, which the GPU does not permit.
The solution in both cases is to put rays into "queues", such that all
the rays in a given queue take the same code path [for a while]. When
you need to spawn a secondary ray, you add it to a queue rather than
recursively tracing it. When some rays hit an object and others don't,
you add them to different queues. The rays in each queue can then be
processed in batches later.
The key problem is that if a queue ends up with very few rays in it,
you're going to have a hell of a lot of idle cores while you process
that queue. The GPU is usually clocked far slower than the CPU; it only
"appears" fast because it has hundreds of cores working in parallel. If
most of those cores are actually idling, you're going to have a problem.
It may turn out not to be any faster than the CPU under unfavourable
conditions.
Another possibility is to run the main renderer on the CPU, adding rays
to queues, and sending any "sufficiently large" queues to the GPU for
processing. I don't know if bandwidth limitations between the two would
make this viable...
Post a reply to this message
|
 |