POV-Ray: Newsgroups: povray.off-topic: Suggestion: OpenCL

POV-Ray : Newsgroups : povray.off-topic : Suggestion: OpenCL		Server Time 13 Jul 2025 06:33:24 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 03:32:58
Message: <4a85132a@news.povray.org>

clipka wrote:
> Saul Luizaga schrieb:
>> clipka wrote:
>>> (*groans*)
>>
>> Way to go to start a discussion...
> 
> Sure, but you're really not the first one, and haven't been so recently.
> 
>> I think you are wrong: "OpenCL (Open Computing Language) greatly 
>> improves speed and responsiveness for a wide spectrum of applications 
>> in numerous market categories from gaming and entertainment to 
>> scientific and medical software."
>>
>>  From here: 
>>
http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/

> 
> 
> That's a nice statement. Where does it originate from?
> 
> Ah, a paper from the group that designed OpenCL to the press. What could 
> be their major goal with such a paper? They're not possibly trying 
> primarily to get attention to that thing? Right, sure they wouldn't want 
> to hype that thing.
> 
> Also note that...
> 
> - "a wide spectrum of applications" is a very vague statement, and may 
> exclude some.
> 
> - The categories mentioned all have one thing in common: Massive number 
> crunching with few decision making.
> 
> POV-Ray does number crunching too, in a sense, but there's a lot of 
> desision making involved.
> 
>> Have you appreciated first hand that overhead making it inviable for 
>> POV-Ray?
> 
> How could I? Do you have an OpenCL implementation available for me so 
> that I could test it?
> 
> But I have read about some limitations of GPU processing in general and 
> with regard to raytracing in particular, and imagine to have enough 
> understanding of computer architecture to be able to say that data 
> exchange between CPU and GPU requires a tad more overhead than 
> inter-process communication between separate threads running on the same 
> CPU.
> 
>> "Tony Tamasi, senior vice president of technical marketing at NVIDIA 


>> powerful way to harness the enormous processing capabilities of our 
>> CUDA-based GPUs on multiple platforms." From the same link.
> 
> Another marketing blurp. Of /course/ the vice president of a big player 
> in the GPU market is advertising it as the greatest invention since 
> sliced bread: It will sell more of their chips.
> 
>> Some GPGPU provide 64-bit Floating Point computing wich is, I think, 
>> the major concern baout raytracing.
> 
> It used to be one of the major ones, and particularly easy to explain, 
> though it's a limitation that is gradually disappearing. I named some 
> others in my previous post.
> 
>> Granted, this new C standard (C99) is not fully supported in any C++ 
>> implementations; Intel C++ supports it for the most part but not 
>> fully. But I think a port to C++ probably is in the making since C++ 
>> is by
>> far more popular than C99 IMHO, so I think, since it has been released 
>> about 8 months ago, maybe there is a C++ ported OpenCL spec or maybe 
>> more by now. Many  computing intensive apps. would want this for 
>> themselves.
> 
> I doubt that C++ support is to come anytime soon, given that OpenCL is 
> even more limited than C99: No function pointers for instance. How could 
> you possibly implement polymorphic objects if you don't even have 
> function pointers at your disposal?
> 
> If a standard imposes limitations which are more rigorous than what 
> you'll find on most brain-dead embedded microcontrollers, then there's a 
> hardware reason for it.
> 
>> OK, maybe is not as suitable for raytracing as it is for protein 
>> folding research, maybe the explanation why not is in the discussion 
>> about CUDA is where the answer is, but maybe is worth it because it 
>> has 64-bit Floating Point computing, which IIRC is the one and only 
>> big obstacle to avoid GPU-aided raytracing.
> 
> It used to be the Big One that used to be mentioned first whenever the 
> discussion popped up again, and possibly the only thing the POV-Ray 
> developers really cared about, historically: Without support for 
> double-precision floating point, there was no point in having any closer 
> look at GPUs. Fortunately for scientific simulations (like that protein 
> folding thing), the precision issue is improving now (probably /because/ 
> the GPU developers want to go for that scientific sim market share). 
> Howver, other limitations still apply, which are no issue for such use 
> cases, but a problem for POV-Ray.
> 
>> Or what I'm missing? don't want any details, only the highlights if 
>> you care to answer.
> 
> No support for recursion is one I named already.
> 
> Another one is that GPUs are highly optimized for massively parallel 
> computations where exactly the same program with exactly the same 
> control flow is run on a vast number of data sets (which is why they 
> /can/ be so fast on this type of problems in the first place), but they 
> can /only/ run programs of this type; so if program flow must be 
> expected to change from one data set to the next, each data set must be 
> run on its own, along with (for instance) 31 "empty" data sets: You lose
> 97% of your processing power. That does not leave much.
> 
> Massive parallelization /could/ be used for the primary rays in a scene. 
> However, those are not the problem anyway: You only have a few million 
> of those, and sophisticated bounding and caching typically keep the 
> workload per ray low. It's usually the secondary rays (testing for 
> shadows, following reflected and refracted rays, and some such) that eat 
> most of the time.
> 

So, you assume that is just a huge amount of hype, and even if it works 
for other apps., won't for POV-Ray. I think there are more to analyze 
than just simple generalizations. Anyway, is a POV/TAG-Team decision, we 
can merely speculate.

>  > thanks, I think I'll try povray.programming.
> 
> I had actually and honestly hoped to discourage you with my initial 
> groaning. I guess the POV-Ray dev team is better informed about GPU 
> computing than you expect.

And you know what I expect the POV/TAG-Team to know... you assume too 
much...

groaning is an emotional response and as such, irrational: I haven't 
been  reading this NG for a long time and the first line of text I read 
was your groaning, besides of the rudeness, it explains nothing, and 
clarifies the same way. I don't like emotional responses to intellectual 
matters, I find them quite out of place, getting in the way of 
logical/rational thinking, making a mess of things, reason enough for me 
to disregard them on sight. Discouraging/intimidating me is not that easy.

Post a reply to this message

From: Stephen
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 05:27:07
Message: <seba85lvjseiqv1l1vuam2kuuh68if0phe@4ax.com>

On Fri, 14 Aug 2009 03:32:15 -0400, Saul Luizaga <sau### [at] netscapenet> wrote:

>And you know what I expect the POV/TAG-Team to know... 

Is there still a TAG team? I know Warp is still around and Gilles and Ken pop in
occasionally but when was the last time any of the others replied to a post?
-- 

Regards
     Stephen

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 05:37:11
Message: <4a853047@news.povray.org>

Stephen wrote:
> On Fri, 14 Aug 2009 03:32:15 -0400, Saul Luizaga <sau### [at] netscapenet> wrote:
> 
>> And you know what I expect the POV/TAG-Team to know... 
> 
> Is there still a TAG team? I know Warp is still around and Gilles and Ken pop in
> occasionally but when was the last time any of the others replied to a post?

No idea man, I assume there is a TAG-Team since I don't think a highly 
technical freeware like this can't exist without one.

Post a reply to this message

From: Warp
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 05:50:18
Message: <4a853359@news.povray.org>

Saul Luizaga <sau### [at] netscapenet> wrote:
> groaning is an emotional response and as such, irrational: I haven't 
> been  reading this NG for a long time and the first line of text I read 
> was your groaning, besides of the rudeness, it explains nothing, and 
> clarifies the same way.

  There are certain subjects which repeat themselves time after time when
some people think that they have an ingenious new idea which surely nobody
has even thought of before and thus they come here and write about it. For
the umpteenth time for regulars. Something which has already been discussed
like a million times to death. No wonder regulars are tired of explaining
the same thing over and over.

  At some point in the past, whem XML was all the hype, it was a rather
regular occurrence for someone to come here and suggest that povray's
scene description language would be changed to XML-conforming. No wonder
that after some time people just start responding to it with "no, that's
just a braindead idea" rather than going once again to minute details why
the idea doesn't work.

  To be fair, though, not all such ideas are unimplementable. The most
prominent example is multithreading: In the past it was again and again
suggested, and again and again shot down as unfeasible due to all the
povray features which are not thread-friendly. In a way, both views were
right: Yes, multithreading *is* implementable in povray (as demonstrated),
and yes, it *is* a huge, huge task (as also demonstrated). Not only did it
require an almost complete refactoring of the source code, but even after
all these years there are still minor problems to be solved because of the
problems introduced by multithreading.

  Using the GPU for rendering in povray is equally unfeasible, even though
for slightly different reasons. Mostly it has to do with GPU features (or
lack thereof) and the amount of data which would have to be constantly
transferred between the graphics card and the system RAM, which would most
probably nullify any theoretical speed advantage.

-- 
                                                          - Warp

Post a reply to this message

From: Invisible
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 06:36:34
Message: <4a853e32$1@news.povray.org>

Chambers wrote:

> Of course, modern GPUs now allow double precision, so we can get to the 
> other objections now.  Specifically:
> 
> 1) Recursion.  As clipka (Christian?) wrote, it is absolutely essential 
> for POV.
> 
> 2) Data parallelization versus code parallelization (this is related to 
> the first, but is not strictly the same).
> 
> The ray tracing algorithm follows drastically different code branches on 
> a single set of data, based on recursion (reflections & refractions), as 
> well as the other various computations needed (texture calculation, 
> light source occlusion, etc) which almost all need access to the entire 
> scene.

There are two problems: recursion and divergence.

When a ray hits something, zero or more secondary rays are spawned. On 
the CPU, this is usually just a recursive function call, but the GPU 
does not permit such a thing.

Also, a GPU consists of *hundreds* of cores, but they must all execute 
the same code path (but with different data). You can set the GPU up to 
process multiple rays, but as soon as some of the rays hit object A but 
others hit object B, the code paths that need to be taken diverge from 
each other, which the GPU does not permit.

The solution in both cases is to put rays into "queues", such that all 
the rays in a given queue take the same code path [for a while]. When 
you need to spawn a secondary ray, you add it to a queue rather than 
recursively tracing it. When some rays hit an object and others don't, 
you add them to different queues. The rays in each queue can then be 
processed in batches later.

The key problem is that if a queue ends up with very few rays in it, 
you're going to have a hell of a lot of idle cores while you process 
that queue. The GPU is usually clocked far slower than the CPU; it only 
"appears" fast because it has hundreds of cores working in parallel. If 
most of those cores are actually idling, you're going to have a problem. 
It may turn out not to be any faster than the CPU under unfavourable 
conditions.

Another possibility is to run the main renderer on the CPU, adding rays 
to queues, and sending any "sufficiently large" queues to the GPU for 
processing. I don't know if bandwidth limitations between the two would 
make this viable...

Post a reply to this message

From: Daniel Bastos
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 12:52:25
Message: <4a859649$1@news.povray.org>

In article <4a84d512@news.povray.org>,
Saul Luizaga wrote:

> clipka wrote:
>> Saul Luizaga schrieb:
>> (*groans*)
>
> Way to go to start a discussion...

LOL!

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 16:00:09
Message: <4a85c249@news.povray.org>

Invisible wrote:

I understand perfectly this problems.

> Another possibility is to run the main renderer on the CPU, adding rays 
> to queues, and sending any "sufficiently large" queues to the GPU for 
> processing. I don't know if bandwidth limitations between the two would 
> make this viable...

Exactly, that is why I asked: "Are absolutely sure there isn't a case 
where a GPU can help? maybe in the middle of a rendering/parsing?".

As for the bandwidth and memory concerns, from here: 
http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=95060&enterthread=y

Q: What are AMD's stream computing product features?

A: AMD's FireStreamtm 9170, our latest generation stream computing GPU, 
features:
* 320 stream cores (compute units or ALUs)
* 2GB on-board GDDR3 memory
* Double precision floating point support
* PCIe 2.0 x16 interface
View AMD FireStream 9170 specifications here: 
http://ati.amd.com/products/streamprocessor/specs.html

Memory Concern:
--------------
  Maybe would be a good idea to leave the processed data on the video 
card  local memory until is needed in Main Memory.

Bandwidth Concern:
-----------------
- M4A78 PLUS MoBo 
(http://usa.asus.com/products.aspx?l1=3&l2=149&l3=758&l4=0&model=2889&modelmenu=1):
# It features dual-channel DDR2 1066 memory support and accelerates data 

(http://www.amd.com/us/products/desktop/processors/phenom-ii/Pages/phenom-ii-key-architectural-features.aspx)
# One 16-bit link at up to 4000MT/s

HyperTransport Generation 3.0 mode
# Up to 37GB/s total delivered processor-to-system bandwidth 
(HyperTransport bus + memory bus)

PCIe Card Electromechanical 2.0 Specification 
(http://www.pcisig.com/specifications/pciexpress/base2)
# Signaling

PCI Express Base 2.0 specification doubles the interconnect bit rate 
from 2.5 GT/s to 5 GT/s in a seamless and compatible manner. The 
performance boost to 5 GT/s is by far the most important feature of the 
PCI Express 2.0 specifications. It effectively increases the aggregate 
bandwidth of a 16-lane link to approximately 16 GB/s.

- Video Card: MD FireStream 9170 (specs above)

----------------- ************** ----------------

As you can see, maybe bandwidth it isn't much of an issue since The 
transfer between the PCIe video card and the Main memory can me made at 
5 GT/s. Is this still insufficient for POV-Ray peak performance?

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 16:16:27
Message: <4a85c61b@news.povray.org>

Warp wrote:
> Saul Luizaga <sau### [at] netscapenet> wrote:
>> groaning is an emotional response and as such, irrational: I haven't 
>> been  reading this NG for a long time and the first line of text I read 
>> was your groaning, besides of the rudeness, it explains nothing, and 
>> clarifies the same way.
> 
>   There are certain subjects which repeat themselves time after time when
> some people think that they have an ingenious new idea which surely nobody
> has even thought of before and thus they come here and write about it. For
> the umpteenth time for regulars. Something which has already been discussed
> like a million times to death. No wonder regulars are tired of explaining
> the same thing over and over.

Well instead of groaning you can make a small .txt file in your PC: 
"Alrady discussed, conclusions were:
1)....
2)...
3)..."
or something like that, to avoid frustration and redundancy.

I don't think my ideas are revolutionary, nor new, nor ingenious, I'm 
just suggesting something that MAY or MAY NOT have not been discussed 
before.

Also I assume everyone here knows more than me, including the 
POV/TAG-Team, so this is more of a hint than a suggestion. Sometimes 
smart people forget about simple things.

>   At some point in the past, whem XML was all the hype, it was a rather
> regular occurrence for someone to come here and suggest that povray's
> scene description language would be changed to XML-conforming. No wonder
> that after some time people just start responding to it with "no, that's
> just a braindead idea" rather than going once again to minute details why
> the idea doesn't work.
> 
>   To be fair, though, not all such ideas are unimplementable. The most
> prominent example is multithreading: In the past it was again and again
> suggested, and again and again shot down as unfeasible due to all the
> povray features which are not thread-friendly. In a way, both views were
> right: Yes, multithreading *is* implementable in povray (as demonstrated),
> and yes, it *is* a huge, huge task (as also demonstrated). Not only did it
> require an almost complete refactoring of the source code, but even after
> all these years there are still minor problems to be solved because of the
> problems introduced by multithreading.

I see, I know POV-Ray source code is HUGE and any minor changes 
represent big efforts. But, at the seems, in this case was a necessary one.

>   Using the GPU for rendering in povray is equally unfeasible, even though
> for slightly different reasons. Mostly it has to do with GPU features (or
> lack thereof) and the amount of data which would have to be constantly
> transferred between the graphics card and the system RAM, which would most
> probably nullify any theoretical speed advantage.
> 

Maybe there is a use for it, not as another main processor but as 
secondary one. I posted about it in another post.

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 16:17:36
Message: <4a85c660@news.povray.org>

:-D

Post a reply to this message

From: Saul Luizaga
Subject: Re: Suggestion: OpenCL
Date: 14 Aug 2009 19:44:28
Message: <4a85f6dc$1@news.povray.org>

I mean, the video card has 2GB GDDR3 of RAM and if a suitable threads 
are found for the GPU and all that work can be leaved in the video card 
memory until is needed.

Maybe, even the data in the video card memory could be used to give a 
partial rough preview as the scene is rendered, wouldn't be any clear 
sometimes but it could give you a hint what the GPU is doing on the fly, 
at least I think it would be cool to see it. Probably even won't be too 
much of a delay to display this since is already in he video card. Of 
course this is very optional.

Also I was wondering: what is the bandwidth between the CPU and the Main 
Memory at render time? maybe this can help us calculate a rough estimate 
of the bandwidth needed for the video card.

Cheers.

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>