POV-Ray: Newsgroups: povray.general: Raytracing on GPU

POV-Ray : Newsgroups : povray.general : Raytracing on GPU		Server Time 17 May 2024 08:11:57 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: clipka
Subject: Re: Raytracing on GPU
Date: 4 Aug 2011 19:04:34
Message: <4e3b2582$1@news.povray.org>

Am 04.08.2011 23:50, schrieb Ive:

>
> CPU using 8 threads
> fps 33.39695
> 2.9942 seconds
>
> OpenCL CPU using 8 worker units
> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
> 2.8005 seconds
>
> OpenCL GPU using 18 worker units
> Cypress
> 1.7871 seconds

Whatever a "worker unit" may be in this context; I presume that 18 
worker units max out the Cypress GPU? So apparently we're talking same 
orders of magnitude for CPU and GPU (with even better score for the 
GPU), which sounds quite promising.

Not the 60x Luxrender speedup though; I guess they must be optimizing 
their architecture for ray coherence or some such to really utilize the 
"Extreme SIMD" strength of the GPU, which is probably way beyond the 
development man-hours currently available to POV-Ray.

Post a reply to this message

From: Ive
Subject: Re: Raytracing on GPU
Date: 5 Aug 2011 01:27:08
Message: <4e3b7f2c$1@news.povray.org>

Am 05.08.2011 01:04, schrieb clipka:
> Am 04.08.2011 23:50, schrieb Ive:
>
>>
>> CPU using 8 threads
>> fps 33.39695
>> 2.9942 seconds
>>
>> OpenCL CPU using 8 worker units
>> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>> 2.8005 seconds
>>
>> OpenCL GPU using 18 worker units
>> Cypress
>> 1.7871 seconds
>
> Whatever a "worker unit" may be in this context; I presume that 18
> worker units max out the Cypress GPU?

I've instructed OpenCL to use all available computing units and the 
count of "worker units" is what it reports back, 8 for the hyperthreaded 
quad-core CPU and 18 for the HD 5870 (Cypress) graphics card.


  So apparently we're talking same
> orders of magnitude for CPU and GPU (with even better score for the
> GPU), which sounds quite promising.
>
Err, I do not see orders of magnitude, this 33.something is just one of 
my usual copy and paste errors (sorry) so there is just about 3 seconds 
for the "reference" CPU implementation compared to about 1.8 seconds for 
the OpenCL GPU kernel.


> Not the 60x Luxrender speedup though; I guess they must be optimizing
> their architecture for ray coherence or some such to really utilize the
> "Extreme SIMD" strength of the GPU, which is probably way beyond the
> development man-hours currently available to POV-Ray.

Well, actually this 60x speedup is the result of some experiment using 
only 32bit floats and calculating a julia fractal. The fractal 
calculation with pure brute force and without branching is much more to 
the liking of the GPU and my first OpenCL experiment was also a zoom 
into the good old mandelbrot set but with double precision floats and it 
showed a speedup of up to 25x).

The GPU bottleneck is its ALU (branching there is even much more worse 
than on the CPU) and also any access to memory that is not allocated on 
the graphics card. For a more sophisticated raytracer I'd expect no 
speedup at all compared to a quad-core CPU (and the Cypress class GPU is 
currently one of the fastest available).
On the other hand there might be a lot of room for NVidia's and AMD's 
JIT compiler to produce better optimized code in the future (as the 
Intel compiler obviously already does for the OpenCL CPU kernel) and AMD 
has already announced some major improvement (whatever this means) 
especially related to OpenCL support for the next driver release.
And of course as I am also quite new to OpenCL my kernel code might not 
be the best but I'm working on it ;)

-Ive

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages