POV-Ray: Newsgroups: povray.off-topic: Suggestion: OpenCL: Re: Suggestion: OpenCL

POV-Ray : Newsgroups : povray.off-topic : Suggestion: OpenCL : Re: Suggestion: OpenCL		Server Time 12 Jul 2025 08:41:55 EDT (-0400)

From: Invisible
Date: 14 Aug 2009 06:36:34
Message: <4a853e32$1@news.povray.org>

Chambers wrote:

> Of course, modern GPUs now allow double precision, so we can get to the 
> other objections now.  Specifically:
> 
> 1) Recursion.  As clipka (Christian?) wrote, it is absolutely essential 
> for POV.
> 
> 2) Data parallelization versus code parallelization (this is related to 
> the first, but is not strictly the same).
> 
> The ray tracing algorithm follows drastically different code branches on 
> a single set of data, based on recursion (reflections & refractions), as 
> well as the other various computations needed (texture calculation, 
> light source occlusion, etc) which almost all need access to the entire 
> scene.

There are two problems: recursion and divergence.

When a ray hits something, zero or more secondary rays are spawned. On 
the CPU, this is usually just a recursive function call, but the GPU 
does not permit such a thing.

Also, a GPU consists of *hundreds* of cores, but they must all execute 
the same code path (but with different data). You can set the GPU up to 
process multiple rays, but as soon as some of the rays hit object A but 
others hit object B, the code paths that need to be taken diverge from 
each other, which the GPU does not permit.

The solution in both cases is to put rays into "queues", such that all 
the rays in a given queue take the same code path [for a while]. When 
you need to spawn a secondary ray, you add it to a queue rather than 
recursively tracing it. When some rays hit an object and others don't, 
you add them to different queues. The rays in each queue can then be 
processed in batches later.

The key problem is that if a queue ends up with very few rays in it, 
you're going to have a hell of a lot of idle cores while you process 
that queue. The GPU is usually clocked far slower than the CPU; it only 
"appears" fast because it has hundreds of cores working in parallel. If 
most of those cores are actually idling, you're going to have a problem. 
It may turn out not to be any faster than the CPU under unfavourable 
conditions.

Another possibility is to run the main renderer on the CPU, adding rays 
to queues, and sending any "sufficiently large" queues to the GPU for 
processing. I don't know if bandwidth limitations between the two would 
make this viable...

Post a reply to this message