POV-Ray: Newsgroups: povray.general: Raytracing on GPU

POV-Ray : Newsgroups : povray.general : Raytracing on GPU		Server Time 17 May 2024 05:10:14 EDT (-0400)

<<< Previous 2 Messages

Goto Initial 10 Messages

From: jhu
Subject: Re: Raytracing on GPU
Date: 25 Jul 2011 13:05:00
Message: <web.4e2da19e266590a260d9c55b0@news.povray.org>

ATI and NVidia GPUs have had 64-bit float since 2008. I would imagine parts of
Povray could be rewritten with OpenCL to take advantage of these things. GPUs
are fairly general purpose nowadays.

Post a reply to this message

From: nemesis
Subject: Re: Raytracing on GPU
Date: 25 Jul 2011 13:30:01
Message: <web.4e2da706266590a2273b877e0@news.povray.org>

http://raytracey.blogspot.com/2011/04/kajiyas-scene-from-rendering-equation.html

No polygons there, just spheres.  They recreated in real-time on a GPU the
original scene from Kajiya's paper on the rendering equation and a path tracer.

Even blender got its own fast GPU path tracer these days, though it certainly
uses polygons:

http://blenderartists.org/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering

last pages show incredible stuff...

Post a reply to this message

From: Alain
Subject: Re: Raytracing on GPU
Date: 25 Jul 2011 17:49:46
Message: <4e2de4fa@news.povray.org>


> http://raytracey.blogspot.com/2011/04/kajiyas-scene-from-rendering-equation.html
>
> No polygons there, just spheres.  They recreated in real-time on a GPU the
> original scene from Kajiya's paper on the rendering equation and a path tracer.
>
> Even blender got its own fast GPU path tracer these days, though it certainly
> uses polygons:
>
>
http://blenderartists.org/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-rendering
>
> last pages show incredible stuff...
>
>

There is still the isue that, if you don't have an nVidia chip with 
CUDA, it won't work at all...

Post a reply to this message

From: jhu
Subject: Re: Raytracing on GPU
Date: 26 Jul 2011 01:55:00
Message: <web.4e2e55d6266590a253ab8e5e0@news.povray.org>

Alain <aze### [at] qwertyorg> wrote:

> > http://raytracey.blogspot.com/2011/04/kajiyas-scene-from-rendering-equation.html
> >
> > No polygons there, just spheres.  They recreated in real-time on a GPU the
> > original scene from Kajiya's paper on the rendering equation and a path tracer.
> >
> > Even blender got its own fast GPU path tracer these days, though it certainly
> > uses polygons:
> >
> >
http://blenderartists.org/forum/showthread.php?216113-Brecht-s-easter-egg-surprise-Modernizing-shading-and-renderin
g
> >
> > last pages show incredible stuff...
> >
> >
>
> There is still the isue that, if you don't have an nVidia chip with
> CUDA, it won't work at all...

Incorrect. ATI/AMD users can install Stream SDK for OpenCL to work.

Post a reply to this message

From: Ive
Subject: Re: Raytracing on GPU
Date: 27 Jul 2011 00:53:56
Message: <4e2f99e4$1@news.povray.org>

Am 25.07.2011 19:02, schrieb jhu:
> ATI and NVidia GPUs have had 64-bit float since 2008. I would imagine parts of
> Povray could be rewritten with OpenCL to take advantage of these things. GPUs
> are fairly general purpose nowadays.
>
>
Note that support for double precision floating point types is *not* 
part of the OpenCL 1.1 specification but just an optional implementor 
specific feature.
As a matter of fact AMD's OpenCL implementation (BTW meanwhile called 
"APP" and no longer "Stream") does not support doubles neither does 
NVidia's.
The only platform that does actually support 64-bit floats is Intels 
OpenCL SDK - and it's JIT compiler does in fact an amazing job in 
automatically vectorizing and optimizing for SSE2/3/4 registers 
depending on the used platform - but obviously it does not support any GPU.
So the funny situation ATM: if you have an AMD processor you'll need the 
Intel OpenCL SDK installed to get support for doubles within OpenCL for 
your AMD CPU and exactly zero OpenCL platfoms support 64bit floats for 
GPU's.

-Ive

Post a reply to this message

From: Ive
Subject: Re: Raytracing on GPU
Date: 28 Jul 2011 11:11:08
Message: <4e317c0c$1@news.povray.org>

Am 27.07.2011 06:53, schrieb Ive:
> Note that support for double precision floating point types is *not*
> part of the OpenCL 1.1 specification but just an optional implementor
> specific feature.
> As a matter of fact AMD's OpenCL implementation (BTW meanwhile called
> "APP" and no longer "Stream") does not support doubles neither does
> NVidia's.
> The only platform that does actually support 64-bit floats is Intels
> OpenCL SDK - and it's JIT compiler does in fact an amazing job in
> automatically vectorizing and optimizing for SSE2/3/4 registers
> depending on the used platform - but obviously it does not support any GPU.
> So the funny situation ATM: if you have an AMD processor you'll need the
> Intel OpenCL SDK installed to get support for doubles within OpenCL for
> your AMD CPU and exactly zero OpenCL platfoms support 64bit floats for
> GPU's.
>

Just a quick update:
The brand new (from today) AMD OpenCL driver does support 64bit floats 
for GPU's. And it even works, just wrote a simple program that makes use 
of it.
But still no support for double from AMD for its own CPU's.

-Ive

Post a reply to this message

From: clipka
Subject: Re: Raytracing on GPU
Date: 4 Aug 2011 17:13:03
Message: <4e3b0b5f$1@news.povray.org>

Am 25.07.2011 09:49, schrieb jhu:
> http://hardware.slashdot.org/comments.pl?sid=2346334&cid=36867000
>
> This  particular post on slashdot was interesting. Mental Ray can use the GPU
> and throws thousands of threads at it mostly due to waiting for elements in main
>   memory. How feasible is this for Povray?

With the (very recent) release of the first (GP)GPU supporting both 
64-bit floats and recursion, adapting POV-Ray for (those) GPUs might 
actually become technically feasible soon (though it still may take 
quite some time before it hits the dev team's top priorities list).

An open question would still be that of performance, which will mainly 
depend on how well the software architecture fits with the "Extreme 
SIMD" ("Single Instruction Multiple Data") hardware architecture. We 
might see a positive surprise there, though it could just as well turn 
out a big disappointment.

In any case I think the easiest-to-implement approach (and hence the 
best approach for official POV-Ray) at GPU support would be to implement 
network rendering first (it's high up on the ToDo list anyway), and then 
treat the GPU as a separate rendering node.

Post a reply to this message

From: Ive
Subject: Re: Raytracing on GPU
Date: 4 Aug 2011 17:51:18
Message: <4e3b1456$1@news.povray.org>

Am 04.08.2011 23:12, schrieb clipka:
> An open question would still be that of performance, which will mainly
> depend on how well the software architecture fits with the "Extreme
> SIMD" ("Single Instruction Multiple Data") hardware architecture. We
> might see a positive surprise there, though it could just as well turn
> out a big disappointment.
>

Guess I can answer this one as I was just curious (people from the 
luxrender project did claim a 60x speedup) and did implement a simple 
raytracer with a static scene and did follow Intel's recommendation of 
doing warmup runs before actually measuring the runtime. Here are the 
results for the raw calculation time without scene initialization:

CPU using 8 threads
fps 33.39695
2.9942 seconds

OpenCL CPU using 8 worker units
Intel(R) Core(TM) i7 CPU  950  @ 3.07GHz
2.8005 seconds

OpenCL GPU using 18 worker units
Cypress
1.7871 seconds

-Ive

Post a reply to this message

From: clipka
Subject: Re: Raytracing on GPU
Date: 4 Aug 2011 19:04:34
Message: <4e3b2582$1@news.povray.org>

Am 04.08.2011 23:50, schrieb Ive:

>
> CPU using 8 threads
> fps 33.39695
> 2.9942 seconds
>
> OpenCL CPU using 8 worker units
> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
> 2.8005 seconds
>
> OpenCL GPU using 18 worker units
> Cypress
> 1.7871 seconds

Whatever a "worker unit" may be in this context; I presume that 18 
worker units max out the Cypress GPU? So apparently we're talking same 
orders of magnitude for CPU and GPU (with even better score for the 
GPU), which sounds quite promising.

Not the 60x Luxrender speedup though; I guess they must be optimizing 
their architecture for ray coherence or some such to really utilize the 
"Extreme SIMD" strength of the GPU, which is probably way beyond the 
development man-hours currently available to POV-Ray.

Post a reply to this message

From: Ive
Subject: Re: Raytracing on GPU
Date: 5 Aug 2011 01:27:08
Message: <4e3b7f2c$1@news.povray.org>

Am 05.08.2011 01:04, schrieb clipka:
> Am 04.08.2011 23:50, schrieb Ive:
>
>>
>> CPU using 8 threads
>> fps 33.39695
>> 2.9942 seconds
>>
>> OpenCL CPU using 8 worker units
>> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>> 2.8005 seconds
>>
>> OpenCL GPU using 18 worker units
>> Cypress
>> 1.7871 seconds
>
> Whatever a "worker unit" may be in this context; I presume that 18
> worker units max out the Cypress GPU?

I've instructed OpenCL to use all available computing units and the 
count of "worker units" is what it reports back, 8 for the hyperthreaded 
quad-core CPU and 18 for the HD 5870 (Cypress) graphics card.


  So apparently we're talking same
> orders of magnitude for CPU and GPU (with even better score for the
> GPU), which sounds quite promising.
>
Err, I do not see orders of magnitude, this 33.something is just one of 
my usual copy and paste errors (sorry) so there is just about 3 seconds 
for the "reference" CPU implementation compared to about 1.8 seconds for 
the OpenCL GPU kernel.


> Not the 60x Luxrender speedup though; I guess they must be optimizing
> their architecture for ray coherence or some such to really utilize the
> "Extreme SIMD" strength of the GPU, which is probably way beyond the
> development man-hours currently available to POV-Ray.

Well, actually this 60x speedup is the result of some experiment using 
only 32bit floats and calculating a julia fractal. The fractal 
calculation with pure brute force and without branching is much more to 
the liking of the GPU and my first OpenCL experiment was also a zoom 
into the good old mandelbrot set but with double precision floats and it 
showed a speedup of up to 25x).

The GPU bottleneck is its ALU (branching there is even much more worse 
than on the CPU) and also any access to memory that is not allocated on 
the graphics card. For a more sophisticated raytracer I'd expect no 
speedup at all compared to a quad-core CPU (and the Cypress class GPU is 
currently one of the fastest available).
On the other hand there might be a lot of room for NVidia's and AMD's 
JIT compiler to produce better optimized code in the future (as the 
Intel compiler obviously already does for the OpenCL CPU kernel) and AMD 
has already announced some major improvement (whatever this means) 
especially related to OpenCL support for the next driver release.
And of course as I am also quite new to OpenCL my kernel code might not 
be the best but I'm working on it ;)

-Ive

Post a reply to this message

<<< Previous 2 Messages

Goto Initial 10 Messages