POV-Ray: Newsgroups: povray.off-topic: Suggestion: OpenCL

POV-Ray : Newsgroups : povray.off-topic : Suggestion: OpenCL		Server Time 12 Jul 2025 22:58:24 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 05:37:11
Message: <4a853047@news.povray.org>

Stephen wrote:
> On Fri, 14 Aug 2009 03:32:15 -0400, Saul Luizaga <sau### [at] netscapenet> wrote:
> 
>> And you know what I expect the POV/TAG-Team to know... 
> 
> Is there still a TAG team? I know Warp is still around and Gilles and Ken pop in
> occasionally but when was the last time any of the others replied to a post?

No idea man, I assume there is a TAG-Team since I don't think a highly 
technical freeware like this can't exist without one.

Post a reply to this message

From: Warp
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 05:50:18
Message: <4a853359@news.povray.org>

Saul Luizaga <sau### [at] netscapenet> wrote:
> groaning is an emotional response and as such, irrational: I haven't 
> been  reading this NG for a long time and the first line of text I read 
> was your groaning, besides of the rudeness, it explains nothing, and 
> clarifies the same way.

  There are certain subjects which repeat themselves time after time when
some people think that they have an ingenious new idea which surely nobody
has even thought of before and thus they come here and write about it. For
the umpteenth time for regulars. Something which has already been discussed
like a million times to death. No wonder regulars are tired of explaining
the same thing over and over.

  At some point in the past, whem XML was all the hype, it was a rather
regular occurrence for someone to come here and suggest that povray's
scene description language would be changed to XML-conforming. No wonder
that after some time people just start responding to it with "no, that's
just a braindead idea" rather than going once again to minute details why
the idea doesn't work.

  To be fair, though, not all such ideas are unimplementable. The most
prominent example is multithreading: In the past it was again and again
suggested, and again and again shot down as unfeasible due to all the
povray features which are not thread-friendly. In a way, both views were
right: Yes, multithreading *is* implementable in povray (as demonstrated),
and yes, it *is* a huge, huge task (as also demonstrated). Not only did it
require an almost complete refactoring of the source code, but even after
all these years there are still minor problems to be solved because of the
problems introduced by multithreading.

  Using the GPU for rendering in povray is equally unfeasible, even though
for slightly different reasons. Mostly it has to do with GPU features (or
lack thereof) and the amount of data which would have to be constantly
transferred between the graphics card and the system RAM, which would most
probably nullify any theoretical speed advantage.

-- 
                                                          - Warp

Post a reply to this message

From: Invisible
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 06:36:34
Message: <4a853e32$1@news.povray.org>

Chambers wrote:

> Of course, modern GPUs now allow double precision, so we can get to the 
> other objections now.  Specifically:
> 
> 1) Recursion.  As clipka (Christian?) wrote, it is absolutely essential 
> for POV.
> 
> 2) Data parallelization versus code parallelization (this is related to 
> the first, but is not strictly the same).
> 
> The ray tracing algorithm follows drastically different code branches on 
> a single set of data, based on recursion (reflections & refractions), as 
> well as the other various computations needed (texture calculation, 
> light source occlusion, etc) which almost all need access to the entire 
> scene.

There are two problems: recursion and divergence.

When a ray hits something, zero or more secondary rays are spawned. On 
the CPU, this is usually just a recursive function call, but the GPU 
does not permit such a thing.

Also, a GPU consists of *hundreds* of cores, but they must all execute 
the same code path (but with different data). You can set the GPU up to 
process multiple rays, but as soon as some of the rays hit object A but 
others hit object B, the code paths that need to be taken diverge from 
each other, which the GPU does not permit.

The solution in both cases is to put rays into "queues", such that all 
the rays in a given queue take the same code path [for a while]. When 
you need to spawn a secondary ray, you add it to a queue rather than 
recursively tracing it. When some rays hit an object and others don't, 
you add them to different queues. The rays in each queue can then be 
processed in batches later.

The key problem is that if a queue ends up with very few rays in it, 
you're going to have a hell of a lot of idle cores while you process 
that queue. The GPU is usually clocked far slower than the CPU; it only 
"appears" fast because it has hundreds of cores working in parallel. If 
most of those cores are actually idling, you're going to have a problem. 
It may turn out not to be any faster than the CPU under unfavourable 
conditions.

Another possibility is to run the main renderer on the CPU, adding rays 
to queues, and sending any "sufficiently large" queues to the GPU for 
processing. I don't know if bandwidth limitations between the two would 
make this viable...

Post a reply to this message

From: Daniel Bastos
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 12:52:25
Message: <4a859649$1@news.povray.org>

In article <4a84d512@news.povray.org>,
Saul Luizaga wrote:

> clipka wrote:
>> Saul Luizaga schrieb:
>> (*groans*)
>
> Way to go to start a discussion...

LOL!

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 16:00:09
Message: <4a85c249@news.povray.org>

Invisible wrote:

I understand perfectly this problems.

> Another possibility is to run the main renderer on the CPU, adding rays 
> to queues, and sending any "sufficiently large" queues to the GPU for 
> processing. I don't know if bandwidth limitations between the two would 
> make this viable...

Exactly, that is why I asked: "Are absolutely sure there isn't a case 
where a GPU can help? maybe in the middle of a rendering/parsing?".

As for the bandwidth and memory concerns, from here: 
http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=95060&enterthread=y

Q: What are AMD's stream computing product features?

A: AMD's FireStreamtm 9170, our latest generation stream computing GPU, 
features:
* 320 stream cores (compute units or ALUs)
* 2GB on-board GDDR3 memory
* Double precision floating point support
* PCIe 2.0 x16 interface
View AMD FireStream 9170 specifications here: 
http://ati.amd.com/products/streamprocessor/specs.html

Memory Concern:
--------------
  Maybe would be a good idea to leave the processed data on the video 
card  local memory until is needed in Main Memory.

Bandwidth Concern:
-----------------
- M4A78 PLUS MoBo 
(http://usa.asus.com/products.aspx?l1=3&l2=149&l3=758&l4=0&model=2889&modelmenu=1):
# It features dual-channel DDR2 1066 memory support and accelerates data 

(http://www.amd.com/us/products/desktop/processors/phenom-ii/Pages/phenom-ii-key-architectural-features.aspx)
# One 16-bit link at up to 4000MT/s

HyperTransport Generation 3.0 mode
# Up to 37GB/s total delivered processor-to-system bandwidth 
(HyperTransport bus + memory bus)

PCIe Card Electromechanical 2.0 Specification 
(http://www.pcisig.com/specifications/pciexpress/base2)
# Signaling

PCI Express Base 2.0 specification doubles the interconnect bit rate 
from 2.5 GT/s to 5 GT/s in a seamless and compatible manner. The 
performance boost to 5 GT/s is by far the most important feature of the 
PCI Express 2.0 specifications. It effectively increases the aggregate 
bandwidth of a 16-lane link to approximately 16 GB/s.

- Video Card: MD FireStream 9170 (specs above)

----------------- ************** ----------------

As you can see, maybe bandwidth it isn't much of an issue since The 
transfer between the PCIe video card and the Main memory can me made at 
5 GT/s. Is this still insufficient for POV-Ray peak performance?

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 16:16:27
Message: <4a85c61b@news.povray.org>

Warp wrote:
> Saul Luizaga <sau### [at] netscapenet> wrote:
>> groaning is an emotional response and as such, irrational: I haven't 
>> been  reading this NG for a long time and the first line of text I read 
>> was your groaning, besides of the rudeness, it explains nothing, and 
>> clarifies the same way.
> 
>   There are certain subjects which repeat themselves time after time when
> some people think that they have an ingenious new idea which surely nobody
> has even thought of before and thus they come here and write about it. For
> the umpteenth time for regulars. Something which has already been discussed
> like a million times to death. No wonder regulars are tired of explaining
> the same thing over and over.

Well instead of groaning you can make a small .txt file in your PC: 
"Alrady discussed, conclusions were:
1)....
2)...
3)..."
or something like that, to avoid frustration and redundancy.

I don't think my ideas are revolutionary, nor new, nor ingenious, I'm 
just suggesting something that MAY or MAY NOT have not been discussed 
before.

Also I assume everyone here knows more than me, including the 
POV/TAG-Team, so this is more of a hint than a suggestion. Sometimes 
smart people forget about simple things.

>   At some point in the past, whem XML was all the hype, it was a rather
> regular occurrence for someone to come here and suggest that povray's
> scene description language would be changed to XML-conforming. No wonder
> that after some time people just start responding to it with "no, that's
> just a braindead idea" rather than going once again to minute details why
> the idea doesn't work.
> 
>   To be fair, though, not all such ideas are unimplementable. The most
> prominent example is multithreading: In the past it was again and again
> suggested, and again and again shot down as unfeasible due to all the
> povray features which are not thread-friendly. In a way, both views were
> right: Yes, multithreading *is* implementable in povray (as demonstrated),
> and yes, it *is* a huge, huge task (as also demonstrated). Not only did it
> require an almost complete refactoring of the source code, but even after
> all these years there are still minor problems to be solved because of the
> problems introduced by multithreading.

I see, I know POV-Ray source code is HUGE and any minor changes 
represent big efforts. But, at the seems, in this case was a necessary one.

>   Using the GPU for rendering in povray is equally unfeasible, even though
> for slightly different reasons. Mostly it has to do with GPU features (or
> lack thereof) and the amount of data which would have to be constantly
> transferred between the graphics card and the system RAM, which would most
> probably nullify any theoretical speed advantage.
> 

Maybe there is a use for it, not as another main processor but as 
secondary one. I posted about it in another post.

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 16:17:36
Message: <4a85c660@news.povray.org>

:-D

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 19:44:28
Message: <4a85f6dc$1@news.povray.org>

I mean, the video card has 2GB GDDR3 of RAM and if a suitable threads 
are found for the GPU and all that work can be leaved in the video card 
memory until is needed.

Maybe, even the data in the video card memory could be used to give a 
partial rough preview as the scene is rendered, wouldn't be any clear 
sometimes but it could give you a hint what the GPU is doing on the fly, 
at least I think it would be cool to see it. Probably even won't be too 
much of a delay to display this since is already in he video card. Of 
course this is very optional.

Also I was wondering: what is the bandwidth between the CPU and the Main 
Memory at render time? maybe this can help us calculate a rough estimate 
of the bandwidth needed for the video card.

Cheers.

Post a reply to this message

From: clipka
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 20:16:59
Message: <4a85fe7b$1@news.povray.org>

Saul Luizaga schrieb:
>> 1) Recursion.  As clipka (Christian?) wrote, it is absolutely 
>> essential for POV.
> 
> I suppose this is unsolvable without an C++ ported OpenCL.

???
Recursion is not a feature of C++, it is also a part of standard C99.

>> 2) Data parallelization versus code parallelization (this is related 
>> to the first, but is not strictly the same).
> 
> they say "an API for coordinating data and task-based parallel 
> computation...", this doesn't help? If it could do both maybe would be 
> of use for POV-Ray.

Did you actually /read/ the spec - or just the enthusiastic introduction?

Sure, it does support task-based parallel computation - why? Probably 
because it also targets classic multi-core CPUs, which are ideally 
suited to task-based parallel computing.

GPUs perform very poorly with task-based parallel computations, due to 
their hardware architecture. A software abstraction layer won't change 
that fundamental limitation.

> I see... maybe if GPGPUs are not use as co-processors but as an 
> auxiliary co-processor that is called on demand, if GPU compliant 
> procedure needs to be processed. Are absolutely sure there isn't a case 
> where a GPU can help? maybe in the middle of a rendering/parsing?

No.

POV-Ray's internal workflow does not support asynchronous computations 
(other than having multiple threads independently render parts of the 
image), so only blocking "calls" to the GPU would be of any use, putting 
the CPU task in waiting state in the meantime. Therefore, only portions 
of the code that can be computed /significantly/ faster by the GPU, or 
have any /significant/ size, would warrant "outsourcing" of 
computations, otherwise parameter passing and task switching overhead 
would bog down performance instead of increasing it.

But the only sections of POV-Ray code that do ask for parallelization 
are RGB color and 3D vector computations, or similarly-sized problems; 
these can be parallelized quite well on modern CPUs as well using SSE2 
(i.e. the GPU will not be much faster), and are heavily intermixed with 
conditional branching (i.e. the size of outsourceable work packages is 
very small)

Post a reply to this message

From: clipka
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 20:31:37
Message: <4a8601e9$1@news.povray.org>

Saul Luizaga schrieb:
>> Another possibility is to run the main renderer on the CPU, adding 
>> rays to queues, and sending any "sufficiently large" queues to the GPU 
>> for processing. I don't know if bandwidth limitations between the two 
>> would make this viable...
> 
> Exactly, that is why I asked: "Are absolutely sure there isn't a case 
> where a GPU can help? maybe in the middle of a rendering/parsing?".

Note that although the approach /may/ (!) work, it is a /fundamentally/ 
different approach from what POV-Ray is doing.

Changing POV-Ray to use that approach would imply virtually a complete 
rewrite of the render engine.

> As you can see, maybe bandwidth it isn't much of an issue since The 
> transfer between the PCIe video card and the Main memory can me made at 
> 5 GT/s. Is this still insufficient for POV-Ray peak performance?

So you're looking at peak data transfer rate limits and from them can 
infer that transfer between CPU and GPU memory space is not an issue?

Did you consider latency issues, or the overhead created by the OpenCL 
framework itself? How about the latency for a "function call"?

If your work packages are large enough, then these are no issues. But in 
a raytracer, be prepared for rather small work packages.

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>