|
|
Am 04.08.2011 23:50, schrieb Ive:
>
> CPU using 8 threads
> fps 33.39695
> 2.9942 seconds
>
> OpenCL CPU using 8 worker units
> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
> 2.8005 seconds
>
> OpenCL GPU using 18 worker units
> Cypress
> 1.7871 seconds
Whatever a "worker unit" may be in this context; I presume that 18
worker units max out the Cypress GPU? So apparently we're talking same
orders of magnitude for CPU and GPU (with even better score for the
GPU), which sounds quite promising.
Not the 60x Luxrender speedup though; I guess they must be optimizing
their architecture for ray coherence or some such to really utilize the
"Extreme SIMD" strength of the GPU, which is probably way beyond the
development man-hours currently available to POV-Ray.
Post a reply to this message
|
|
|
|
Am 05.08.2011 01:04, schrieb clipka:
> Am 04.08.2011 23:50, schrieb Ive:
>
>>
>> CPU using 8 threads
>> fps 33.39695
>> 2.9942 seconds
>>
>> OpenCL CPU using 8 worker units
>> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>> 2.8005 seconds
>>
>> OpenCL GPU using 18 worker units
>> Cypress
>> 1.7871 seconds
>
> Whatever a "worker unit" may be in this context; I presume that 18
> worker units max out the Cypress GPU?
I've instructed OpenCL to use all available computing units and the
count of "worker units" is what it reports back, 8 for the hyperthreaded
quad-core CPU and 18 for the HD 5870 (Cypress) graphics card.
So apparently we're talking same
> orders of magnitude for CPU and GPU (with even better score for the
> GPU), which sounds quite promising.
>
Err, I do not see orders of magnitude, this 33.something is just one of
my usual copy and paste errors (sorry) so there is just about 3 seconds
for the "reference" CPU implementation compared to about 1.8 seconds for
the OpenCL GPU kernel.
> Not the 60x Luxrender speedup though; I guess they must be optimizing
> their architecture for ray coherence or some such to really utilize the
> "Extreme SIMD" strength of the GPU, which is probably way beyond the
> development man-hours currently available to POV-Ray.
Well, actually this 60x speedup is the result of some experiment using
only 32bit floats and calculating a julia fractal. The fractal
calculation with pure brute force and without branching is much more to
the liking of the GPU and my first OpenCL experiment was also a zoom
into the good old mandelbrot set but with double precision floats and it
showed a speedup of up to 25x).
The GPU bottleneck is its ALU (branching there is even much more worse
than on the CPU) and also any access to memory that is not allocated on
the graphics card. For a more sophisticated raytracer I'd expect no
speedup at all compared to a quad-core CPU (and the Cypress class GPU is
currently one of the fastest available).
On the other hand there might be a lot of room for NVidia's and AMD's
JIT compiler to produce better optimized code in the future (as the
Intel compiler obviously already does for the OpenCL CPU kernel) and AMD
has already announced some major improvement (whatever this means)
especially related to OpenCL support for the next driver release.
And of course as I am also quite new to OpenCL my kernel code might not
be the best but I'm working on it ;)
-Ive
Post a reply to this message
|
|