POV-Ray: Newsgroups: povray.pov4.discussion.general: A serious question about massively parallel architectures (ie GPUs)

POV-Ray : Newsgroups : povray.pov4.discussion.general : A serious question about massively parallel architectures (ie GPUs)		Server Time 14 Mar 2026 17:06:37 EDT (-0400)

From: Chambers
Subject: A serious question about massively parallel architectures (ie GPUs)
Date: 20 Jan 2010 00:11:39
Message: <4b56908b$1@news.povray.org>

Currently I'm waiting for a scene to render.  And you know what's 
killing render time?

Spawning.

With focal blur creating dozens (I had to turn it down from hundreds) of 
rays per pixel, textures which are averages of 8 (I turned it down from 
16) glass textures which each have reflection *and* refraction, area 
lights, media, etc... almost every decent effect in POV is achieved via 
taking large numbers of samples or spawning large numbers of rays.

So, I started thinking about how to speed things up.  Obviously, the 
single easiest way to do so is to reduce the number of rays that need to 
be shot.  Well, I've already turned down the settings, and simplified my 
textures...  there's not much else I can do in that category.

The next step is to run things more in parallel.  My quad core is 
already running 32 threads (high thread count + low block size = fun 
picture to watch trace), so that's maxed out.

The next obvious place to move code to is the GPU.  However, we can't 
just throw blocks of rays at the GPU... the GPU handles things in blocks 
of processors, but in POV adjacent rays may take wildly divergent code 
paths.  In a worst case scenario, you would be able to trace only a 
single ray per execution block on the GPU at a time; at that rate, the 
cost of the data bus would outweigh any benefits of the GPU itself.

So, the question becomes this: what operations does POV-Ray perform 
which can reasonably be grouped together, following more or less an 
identical code path on a large set of data, such that it could be 
reasonably off-loaded to a highly parallel coprocessor such as current 
and upcoming GPUs? (assuming double precision support, which is making 
great headway in recent architectures)  The important part becomes the 
code path... two adjacent rays must take generally the same code path, 
meaning anything which spawns rays is automatically out.

Darren posited in P.O-T that light occlusion could be possible.  I also 
think that other light calculations, such as gathering samples from 
photons or radiosity, could help.

Perhaps if a ray is known to traverse media, all media calculations for 
a single ray could be performed.

What other calculations does POV perform which would be a good fit for 
architectures like this?

Oh, and this /should/ go without saying, but... please keep this to a 
brainstorming, rather than a bashing, thread.  More than anything, it's 
a thought experiment for me as I watch my render proceed ;)

...Chambers

Post a reply to this message

From: scott
Subject: Re: A serious question about massively parallel architectures (ie GPUs)
Date: 20 Jan 2010 03:48:30
Message: <4b56c35e@news.povray.org>

> So, the question becomes this: what operations does POV-Ray perform which 
> can reasonably be grouped together, following more or less an identical 
> code path on a large set of data, such that it could be reasonably 
> off-loaded to a highly parallel coprocessor such as current and upcoming 
> GPUs? (assuming double precision support, which is making great headway in 
> recent architectures)  The important part becomes the code path... two 
> adjacent rays must take generally the same code path, meaning anything 
> which spawns rays is automatically out.

I found quite a good paper a while ago about bunching rays together, this 
might be useful on hardware where there is a massive speedup for processing 
similar rays together rather than individually.

http://lukasz.dk/files/mlrta.pdf

Obviously at some point individual rays are going to go separate ways 
through separate objects, but I bet in a typical scene there are hundreds, 
if not thousands of rays that all take exactly the same path through the 
same objects.

Also it is not necessary that rays are never spawned, they just get added to 
a list of "rays to be spawned" rather than spawned right there and then. 
That way the code path can be exactly the same until the end of the "pass", 
then all the new rays that need to be spawned are generated and the next 
pass starts on those rays.

Post a reply to this message

From: Edouard
Subject: Re: A serious question about massively parallel architectures (ie GPUs)
Date: 20 Jan 2010 03:55:00
Message: <web.4b56c3ca7e508fbb4692fdd50@news.povray.org>

Chambers <Ben### [at] gmailcom> wrote:
> Currently I'm waiting for a scene to render.  And you know what's
> killing render time?
>
> Spawning.
>
> With focal blur creating dozens (I had to turn it down from hundreds) of
> rays per pixel, textures which are averages of 8 (I turned it down from
> 16) glass textures which each have reflection *and* refraction, area
> lights, media, etc... almost every decent effect in POV is achieved via
> taking large numbers of samples or spawning large numbers of rays.
>
> So, I started thinking about how to speed things up.  Obviously, the
> single easiest way to do so is to reduce the number of rays that need to
> be shot.  Well, I've already turned down the settings, and simplified my
> textures...  there's not much else I can do in that category.

I've taken quite a different approach for this problem - there's probably a
large number of names for it, but I'm calling it "stochastic rendering":

I render the scene multiple times (often 200, 400 or more times), but each pass
has a very simplified, but at the same time completely randomised, set of
paramters. When all those renders are summed together into one image, the random
elements all combine to give a final scene that looks like one with a much more
complex set of parameters.

Critically, when there are multiple "features" that all usually interact to push
render times sky-high, I try to construct a scene that has the interaction as
low as possible, and can render a few hundred images very quickly (each pass
taking anything from 10 seconds to maybe a minute or two).

On multi-core machines I simply fire off as many POV processes as I have CPUs
(or is that ALUs?).

I really must post a set of examples of each kind of effect that created with
the technique, and then some examples of combining them all together.

A really really old example of multiple area lights, blurred floor reflection,
anti-aliasing, focal blur and blurred refraction all done with stochastic
rendering:

One pass: http://www.flickr.com/photos/26722540@N05/3094525065/
400 passes: http://www.flickr.com/photos/26722540@N05/3094524731/

Cheers,
Edouard.

Post a reply to this message

From: Warp
Subject: Re: A serious question about massively parallel architectures (ie GPUs)
Date: 20 Jan 2010 08:58:57
Message: <4b570c21@news.povray.org>

One thing which could help is if the user could control how many rays
are spawned from a point, depending on the situation.

  For example, if you are emulating blurred reflections using averaged
textures, it would suffice to use much simpler scene settings for rays
sent from this surface (because most of the detail would be blurred out
anyways). You might not need, for example, to calculate area lighting
for things hit by these rays. Also if such a ray hits another object
with blurred reflection, it might be enough for it to have just regular
reflection instead.

-- 
                                                          - Warp

Post a reply to this message

From: Chambers
Subject: Re: A serious question about massively parallel architectures (ie GPUs)
Date: 20 Jan 2010 14:05:00
Message: <web.4b5753387e508fbb532258260@news.povray.org>

Chambers <Ben### [at] gmailcom> wrote:
> Oh, and this /should/ go without saying, but... please keep this to a
> brainstorming, rather than a bashing, thread.  More than anything, it's
> a thought experiment for me as I watch my render proceed ;)

And when I left for work this morning, the render was *still* less than half
finished, even with my castrated quality settings.

:(

Ah, well, if I didn't have patience, I never would've learned POV to begin with
:)

....Chambers

Post a reply to this message