POV-Ray: Newsgroups: povray.pov4.discussion.general: A serious question about massively parallel architectures (ie GPUs): A serious question about massively parallel architectures (ie GPUs)

POV-Ray : Newsgroups : povray.pov4.discussion.general : A serious question about massively parallel architectures (ie GPUs) : A serious question about massively parallel architectures (ie GPUs)		Server Time 12 Jul 2025 13:35:10 EDT (-0400)

From: Chambers
Date: 20 Jan 2010 00:11:39
Message: <4b56908b$1@news.povray.org>

Currently I'm waiting for a scene to render.  And you know what's 
killing render time?

Spawning.

With focal blur creating dozens (I had to turn it down from hundreds) of 
rays per pixel, textures which are averages of 8 (I turned it down from 
16) glass textures which each have reflection *and* refraction, area 
lights, media, etc... almost every decent effect in POV is achieved via 
taking large numbers of samples or spawning large numbers of rays.

So, I started thinking about how to speed things up.  Obviously, the 
single easiest way to do so is to reduce the number of rays that need to 
be shot.  Well, I've already turned down the settings, and simplified my 
textures...  there's not much else I can do in that category.

The next step is to run things more in parallel.  My quad core is 
already running 32 threads (high thread count + low block size = fun 
picture to watch trace), so that's maxed out.

The next obvious place to move code to is the GPU.  However, we can't 
just throw blocks of rays at the GPU... the GPU handles things in blocks 
of processors, but in POV adjacent rays may take wildly divergent code 
paths.  In a worst case scenario, you would be able to trace only a 
single ray per execution block on the GPU at a time; at that rate, the 
cost of the data bus would outweigh any benefits of the GPU itself.

So, the question becomes this: what operations does POV-Ray perform 
which can reasonably be grouped together, following more or less an 
identical code path on a large set of data, such that it could be 
reasonably off-loaded to a highly parallel coprocessor such as current 
and upcoming GPUs? (assuming double precision support, which is making 
great headway in recent architectures)  The important part becomes the 
code path... two adjacent rays must take generally the same code path, 
meaning anything which spawns rays is automatically out.

Darren posited in P.O-T that light occlusion could be possible.  I also 
think that other light calculations, such as gathering samples from 
photons or radiosity, could help.

Perhaps if a ray is known to traverse media, all media calculations for 
a single ray could be performed.

What other calculations does POV perform which would be a good fit for 
architectures like this?

Oh, and this /should/ go without saying, but... please keep this to a 
brainstorming, rather than a bashing, thread.  More than anything, it's 
a thought experiment for me as I watch my render proceed ;)

...Chambers

Post a reply to this message