POV-Ray: Newsgroups: povray.programming: Small update on the state of GPGPU

POV-Ray : Newsgroups : povray.programming : Small update on the state of GPGPU		Server Time 18 Sep 2025 22:53:05 EDT (-0400)

From: Benjamin Chambers
Subject: Small update on the state of GPGPU
Date: 23 Jan 2017 11:24:24
Message: <58862e38$1@news.povray.org>

Yes, I know, POV-Ray wouldn't work well on the GPU. But every now and 
then, someone asks about it and, since I've been doing some side 
projects with CUDA, I thought I'd put down here what would actually be 
necessary for it to be useful.

First off, there is a huge latency hit when moving data between the CPU 
and the GPU. This right here is the largest problem; it's simply faster 
to use the CPU for many applications because your data size isn't large 
enough.

In my own experiments, for a simple multiply-add, you need roughly 200 
operations for it to be worthwhile.
Going into more detail, I have a laptop with a dual-core I5 running at 
3GHz, and a mobile GTX 860 video card (640 CUDA cores). With these two 
processors, I found a speed parity at 187 data items. Again, this was a 
simple multiply-add using floating point values. If you're doing 
anything more complex, it will lower the number necessary; if you're 
using double precision, it will increase the number of items needed.

The take-away is that, unless you are performing the same operation on 
about 200 items, it will be faster to run it on the CPU than to use the GPU.

The upside though is that the other previous objections (general purpose 
programmability, support for double-precision, etc) have basically all 
been overcome. With CUDA, OpenCL, and Vulkan available GPUs are easy 
enough to program to actually put POV's functions on consumer level GPUs.

What would be necessary, then, for this to be useful for POV-Ray is for 
POV to cache function calls in sufficient numbers. The caching mechanism 
itself would probably add some overhead, as would switching shader 
cores, but with large image sizes it is conceivable that enough calls 
could be cached to make it worthwhile. It would be a massive amount of 
work in re-writing POV, though.

If POV-Ray ever gets a re-write from scratch, it might happen. 
Otherwise, the current method seems fine.

Post a reply to this message

From: Benjamin Chambers
Subject: Re: Small update on the state of GPGPU
Date: 23 Jan 2017 12:06:27
Message: <58863813$1@news.povray.org>

On 1/23/2017 9:24 AM, Benjamin Chambers wrote:
> If POV-Ray ever gets a re-write from scratch, it might happen.
> Otherwise, the current method seems fine.

Alternatively, we could try implementing it for very specific cases.

For instance, running hit tests against objects... If you have an area 
light, you could run large arrays of hit tests (16x16 or larger would 
probably benefit from this) against individual objects to see if they 
obscure the light source.

Also, as a first pass on the scene, you could run hit tests against 
every single object, and generate a map of the intersections.

Both of those would, of course, be run AFTER bounding-box tests, to skip 
objects entirely.

Post a reply to this message

From: William F Pokorny
Subject: Re: Small update on the state of GPGPU
Date: 26 Jan 2017 13:51:40
Message: <588a453c$1@news.povray.org>

On 01/23/2017 12:06 PM, Benjamin Chambers wrote:
> On 1/23/2017 9:24 AM, Benjamin Chambers wrote:
>> If POV-Ray ever gets a re-write from scratch, it might happen.
>> Otherwise, the current method seems fine.
>
> Alternatively, we could try implementing it for very specific cases.
>
> For instance, running hit tests against objects... If you have an area
> light, you could run large arrays of hit tests (16x16 or larger would
> probably benefit from this) against individual objects to see if they
> obscure the light source.
>
> Also, as a first pass on the scene, you could run hit tests against
> every single object, and generate a map of the intersections.
>
> Both of those would, of course, be run AFTER bounding-box tests, to skip
> objects entirely.
>
Thanks for posting your experience.

One of the places I've wondered if we might sneak a look at GPUs near 
term would be as one or more, new internal functions for use with 
isosurfaces. Specifically isosurfaces needing to evaluate a great many 
functions for each evaluation - something not practical today.

Last I looked for my Ubuntu platform (14.04), OpenCL looked messy to 
install and run. I bailed before attempting anything!

Bill P.

Post a reply to this message

From: clipka
Subject: Re: Small update on the state of GPGPU
Date: 26 Jan 2017 15:08:24
Message: <588a5738$1@news.povray.org>

Am 26.01.2017 um 19:51 schrieb William F Pokorny:

> One of the places I've wondered if we might sneak a look at GPUs near
> term would be as one or more, new internal functions for use with
> isosurfaces. Specifically isosurfaces needing to evaluate a great many
> functions for each evaluation - something not practical today.

If the "sub-functions", as I would like to call them, are different in
structure (or even if their common structure can't be easily identified
as such), I suspect that's not a scenario where a GPGPU can help much.

Post a reply to this message

From: William F Pokorny
Subject: Re: Small update on the state of GPGPU
Date: 26 Jan 2017 18:48:13
Message: <588a8abd$1@news.povray.org>

On 01/26/2017 03:08 PM, clipka wrote:
> Am 26.01.2017 um 19:51 schrieb William F Pokorny:
>
> If the "sub-functions", as I would like to call them, are different in
> structure (or even if their common structure can't be easily identified
> as such), I suspect that's not a scenario where a GPGPU can help much.
>
I agree.

It was the point list as function-origins idea (the voronoi-ish results 
I posted a year or so ago) I had foremost in mind. In other words there 
would be many of the same function each with a unique origin in the 
simplest case. Perhaps to narrow an application practically, but as 
something with which to experiment with GPUs maybe OK...

Bill P.

Post a reply to this message

From: Benjamin Chambers
Subject: Re: Small update on the state of GPGPU
Date: 28 Jan 2017 13:30:19
Message: <182f70b2-c67b-7fba-5d85-33378d5fc266@outlook.com>

On 1/26/2017 1:08 PM, clipka wrote:
> Am 26.01.2017 um 19:51 schrieb William F Pokorny:
>
>> One of the places I've wondered if we might sneak a look at GPUs near
>> term would be as one or more, new internal functions for use with
>> isosurfaces. Specifically isosurfaces needing to evaluate a great many
>> functions for each evaluation - something not practical today.
>
> If the "sub-functions", as I would like to call them, are different in
> structure (or even if their common structure can't be easily identified
> as such), I suspect that's not a scenario where a GPGPU can help much.
>

It's certainly possible; there are frameworks out there (CudaDNN, CNTK, 
and TensorFlow all come to mind, because of the projects I've been 
working on) that all take scripted input and run the final function on 
the GPU.

I suspect the above frameworks have pre-defined shaders for various 
common functions, and merely pass the data between them based on your 
script. However, it would be entirely possible to compile iso functions 
into shader code (Cuda C or the equivalent in OpenCL) and let the driver 
load it onto the GPU for you.

But it wouldn't be worth it, unless you can solve the first problem I 
posted in my original post: generating a situation where you have a 
cache of at least 100 (preferably 200 or more) function calls.

Post a reply to this message

From: Benjamin Chambers
Subject: Re: Small update on the state of GPGPU
Date: 28 Jan 2017 13:31:56
Message: <588ce39c$1@news.povray.org>

On 1/26/2017 1:08 PM, clipka wrote:
 > Am 26.01.2017 um 19:51 schrieb William F Pokorny:
 >
 >> One of the places I've wondered if we might sneak a look at GPUs near
 >> term would be as one or more, new internal functions for use with
 >> isosurfaces. Specifically isosurfaces needing to evaluate a great many
 >> functions for each evaluation - something not practical today.
 >
 > If the "sub-functions", as I would like to call them, are different in
 > structure (or even if their common structure can't be easily identified
 > as such), I suspect that's not a scenario where a GPGPU can help much.
 >

It's certainly possible; there are frameworks out there (CudaDNN, CNTK, 
and TensorFlow all come to mind, because of the projects I've been 
working on) that all take scripted input and run the final function on 
the GPU.

I suspect the above frameworks have pre-defined shaders for various 
common functions, and merely pass the data between them based on your 
script. However, it would be entirely possible to compile iso functions 
into shader code (Cuda C or the equivalent in OpenCL) and let the driver 
load it onto the GPU for you.

But it wouldn't be worth it, unless you can solve the first problem I 
posted in my original post: generating a situation where you have a 
cache of at least 100 (preferably 200 or more) function calls.

Post a reply to this message