 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
> But if you think about common scenes where you have walls and floors and
> so forth, it might be worth using the GPU to test ray intersections
> against these, and against bounding volumes (if you're using them). Huge
> numers of rays need to be run through these tests, so the GPU can fire
> those off quite quickly.
But those tests are also quite simple, so would benefit the least from
the GPU. If isosurfaces could be translated efficiently into shaders,
then those would show the most benefit (and Julia fractals, of course).
> It might also be worth running the "so what the
> hell is the colour of this surface?" calculation on the GPU - what with
> texturing, normal maps, multiple light rays of different angles and
> colours, etc.
Possibly. In fact, using the GPU for GI might be an option. For
instance, we could run a single extremely high resolution pass with no
lighting or texturing, just intersections, and cache all the
intersection locations. Then, feed these intersections to the GPU, and
have it calculate lighting for those points. Feed the lighting data
back to the CPU for radiosity, and voila! Fast GI :) (Of course,
implementing it would be a b*tch, but that's beside the point).
> Also, let's clear this up: My understanding is that the GPU does not
> require *all* cores to run an identical kernel. IIRC, the cores are
> grouped into (fairly large) bundles, each bundle runs a single kernel,
> but different bundles can run completely different kernels. So you don't
> need a ray queue with enough rays for the entire GPU, just for a whole
> bundle of cores.
True, I think the shaders are in blocks of 4, and you have to have
groups of 32 blocks running the same shader or something like that
(which would be 128 shaders per program). I don't remember the exact
numbers, though, and in fact it's probably GPU dependent.
...Chambers
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
clipka wrote:
> Saul Luizaga schrieb:
>>> Another possibility is to run the main renderer on the CPU, adding
>>> rays to queues, and sending any "sufficiently large" queues to the
>>> GPU for processing. I don't know if bandwidth limitations between the
>>> two would make this viable...
>>
>> Exactly, that is why I asked: "Are absolutely sure there isn't a case
>> where a GPU can help? maybe in the middle of a rendering/parsing?".
>
> Note that although the approach /may/ (!) work, it is a /fundamentally/
> different approach from what POV-Ray is doing.
>
> Changing POV-Ray to use that approach would imply virtually a complete
> rewrite of the render engine.
>
>> As you can see, maybe bandwidth it isn't much of an issue since The
>> transfer between the PCIe video card and the Main memory can me made
>> at 5 GT/s. Is this still insufficient for POV-Ray peak performance?
>
> So you're looking at peak data transfer rate limits and from them can
> infer that transfer between CPU and GPU memory space is not an issue?
>
> Did you consider latency issues, or the overhead created by the OpenCL
> framework itself? How about the latency for a "function call"?
>
> If your work packages are large enough, then these are no issues. But in
> a raytracer, be prepared for rather small work packages.
You just don't like to be corrected due to your intellectual vanity and
arrogance, leave those aside for good, they don't do anybody any good.
I made a search on p.o-t on the subject and found nothing. The question
was: "are you ABSOLUTELY SURE it won't work...?" First you say NO. then
MAY (!) work. Obvious contradiction. Then you made a number of
assumptions (againg) about me, all of them WRONG and this is for sure
rude. Posting on p.p was only the right thing to do, since it was a
suggestion I didn't started a discussion there, but here.
On my first post in this thread: get some perspective dude, it was for
everyone, from little tech knowledge to the most advanced, of course
most advanced readers can obvious the explanation/clarification I made
about OpenCL.
To the subject: I don't know much about GPUs, POV-Ray internals,
programming, nor OpenCL, but I do know a little about Computer
electronic architecture. Some of the questions I made are not meant for
this NG readers not even the TAG-Team but to the POV-Team, so they can
made calculations about OpenCL.
I think they may have a structured and classified diagram of ALL POV-Ray
programming structures, knowing approximately how much each structure
would take in the CPU, so from the data I provided they could see:
1) What GPU capable structures.
2) The amount of time that the selected GPU capable structures could
take to do a given process.
3) And this should be first, see if it is worth it, to consider the GPU
aid, if not possible through OpenCL, make a custom framework. If it is,
calculate how many GPU capable structures there are to have a second
evaluation, and see if that number is worth the effort.
4) ETC.
So I'm analyzing to the best of my abilities as everyone here, including
you, but you're taking this too emotional and going for personal attacks
on me trying to discredit me. I'm just trying to see all possibilities
of GPU acceleration and have a discussion and maybe lighten up the
analysis a little for the POV-Team, but maybe is not me place nor even
non POV-TAG-Team people, so maybe I should just had this addressed to
the POV-Team for their consideration only and avoid so much "bothering"
people here. But as you can see there are somethings to say about it
still...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga wrote:
> You just don't like to be corrected
This sounds more like you.
Believe it or not, this subject is an official Dead Horse. Please stop
beating it, you'll only get a bloody mess.
> I made a search on p.o-t on the subject
First, p.o-t is the wrong forum.
Second, posts here expire.
> was: "are you ABSOLUTELY SURE it won't work...?"
No, you didn't even ask a question. You made a statement: "I have just
learned about it and I think this could be a great improvement for POV-ray"
This is something which, by your own words, you have just heard about.
You then quoted a whole bunch of marketing blurbs about it.
Trust me. We have not just heard about it, but have been looking at the
idea for more than *10 years* now. I haven't specifically looked at
OpenCL, but I have CUDA, and the underlying hardware hasn't changed
enough yet to overcome the inherent limitations in the system.
> To the subject: I don't know much about GPUs, POV-Ray internals,
> programming, nor OpenCL,
Then why is it so hard for you to accept that this isn't a good idea at
this time?
If you really want to help make POV-Ray better, the source code is
available. Please, dig in! Start reading the p.p group, and look at
some of the outstanding bugs! That would surely be appreciated more
than your parroting some random marketing blurbs.
...Chambers
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Chambers wrote:
> Saul Luizaga wrote:
>> You just don't like to be corrected
>
> This sounds more like you.
right...
>
> Believe it or not, this subject is an official Dead Horse. Please stop
> beating it, you'll only get a bloody mess.
>
>> I made a search on p.o-t on the subject
>
> First, p.o-t is the wrong forum.
> Second, posts here expire.
>
>> was: "are you ABSOLUTELY SURE it won't work...?"
>
> No, you didn't even ask a question.
news://news.povray.org:119/4a850da7@news.povray.org, check the last
paragraph, proves you wrong. Did you liked to be corrected?
>
> This is something which, by your own words, you have just heard about.
> You then quoted a whole bunch of marketing blurbs about it.
>
> Trust me. We have not just heard about it, but have been looking at the
> idea for more than *10 years* now. I haven't specifically looked at
> OpenCL, but I have CUDA, and the underlying hardware hasn't changed
> enough yet to overcome the inherent limitations in the system.
>
>> To the subject: I don't know much about GPUs, POV-Ray internals,
>> programming, nor OpenCL,
>
> Then why is it so hard for you to accept that this isn't a good idea at
> this time?
>
> If you really want to help make POV-Ray better, the source code is
> available. Please, dig in! Start reading the p.p group, and look at
> some of the outstanding bugs! That would surely be appreciated more
> than your parroting some random marketing blurbs.
>
> ....Chambers
So open minded... thanks, I out of here for good.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga wrote:
>> No, you didn't even ask a question.
>
> news://news.povray.org:119/4a850da7@news.povray.org, check the last
> paragraph, proves you wrong. Did you liked to be corrected?
Yes, I do as a matter of fact. I don't like being wrong.
...Chambers
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga schrieb:
> I made a search on p.o-t on the subject and found nothing.
This may be because p.o-t is actually not the best place for such a
posting, as it is intended for topics /not/ related to POV-Ray.
> The question
> was: "are you ABSOLUTELY SURE it won't work...?" First you say NO. then
> MAY (!) work. Obvious contradiction.
You're quoting me out of context, and out of chronological order.
You proposed OpenCL for POV-Ray. If you go back to my first reply,
you'll find that my answer was manifold back then already - but not
contradictory (nor were my follow-ups):
(I) *NO*, OpenCL will *not* be an option for POV-Ray, because (a) it is
written in a different language (namely C++), and (b) the internal
architecture makes heavy use of constructs not available in OpenCL (most
particularly recursions).
Both of this still stands. If you have any substantial information that
make both of these arguments moot (other than wishful thinking about
where the OpenCL project /may/ be heading some time in the future
according to your imagination), then go ahead and correct me.
There is another point in which I might be mistaken: My argument is
based on assumptions that (a) POV-Ray will not go back to C again, and
(b) the fundamental architecture will not change anytime soon. However,
given that POV-Ray has /just been/ ported from C to C++, and the feat of
adding SMP support has /already/ taken something like 3 years, I don't
expect any strong urge to introduce even more fundamental changes in the
underlying architecture.
This urge /might/ be greater if there was a clear promise of benefit.
However, even /that/ is still doubtful.
(II) Yes, it *MIGHT* be possible to run a full-fledged independent
POV-Ray thread on a GPU; however, this would require some conditions to
be fulfilled first:
(1) POV-Ray would /first/ have to be brought to the point where it can
run distributed renders on multiple machines.
The reason here is that in order to run a separate thread on the GPU,
POV-Ray would need to be enabled to run render threads in fully separate
(or at least non-synchronized) address spaces. As this is also /the/
main prerequisite for distrubuted rendering on multiple machines (in
this multiprocessing model, communication bandwidth and latency are
likely to pose not so much of a problem even in the network, let alone
between CPU and GPU), and network rendering is already a goal POV-Ray is
committed to, I do /not/ expect GPU support to be tackled any time
earler: Network rendering comes with a clear promise of a fair gain in
render speed and usability, while the benefit of GPU rendering is still
carrying a big question mark.
(2) Someone would have to come up with a GPU programming framework much
better suited to POV-Ray, which eliminates all the issues in (I), i.e.
supports C++ and recursions.
(III) With the intention of elaborating on (I), I noted that yes, it
*MIGHT* be possible to include GPU support via OpenCL use in POV-Ray
even without changing the fundamental structure of the application,
*BUT* this would /very/ likely lead to worse performance than without
the GPU support, due to (a) the inability of POV-Ray to supply the GPU
with a large enough and parallelizable work package, and (b) the
inability of POV-Ray to continue while the GPU is processing that work
package.
As a resume, I still am perfectly sure that OpenCL is *not* currently an
option to speed up POV-Ray.
Plus, I am perfectly sure that OpenCL will be further revisited now and
then, and reconsidered as an option, should it be improved to the point
where it appears to make some sense.
> To the subject: I don't know much about GPUs, POV-Ray internals,
> programming, nor OpenCL, but I do know a little about Computer
> electronic architecture. Some of the questions I made are not meant for
> this NG readers not even the TAG-Team but to the POV-Team, so they can
> made calculations about OpenCL.
There's not much that can be calculated from your figures, except the
peak memory transfer rate of some - presumably - representative graphics
hardware. Program execution speed, however, also depends on the
particular compiler (and optimizer!) - which are not even available yet.
> I think they may have a structured and classified diagram of ALL POV-Ray
> programming structures, knowing approximately how much each structure
> would take in the CPU, so from the data I provided they could see:
That would help in no way, because these structures all rely on an
architecture that, as already pointed out, is fundamentally incompatible
with OpenCL-based GPU computing.
> 3) And this should be first, see if it is worth it, to consider the GPU
> aid, if not possible through OpenCL, make a custom framework. If it is,
> calculate how many GPU capable structures there are to have a second
> evaluation, and see if that number is worth the effort.
I don't think that the POV-Ray dev team should even be bothered with
this, because from all I see, current GPU architectures would appear to
either require a /monstrous/ amount of framework development to make it
work for /POV-Ray/ (again, most particularly due to the lack of support
for recursion, which apparently is a hardware limitation, not a
framework thing), or provide /very small/ gain or even a disadvantage
(due to the small size of work packages POV-Ray could supply to the GPU,
and the lack of POV-Ray to do anything else while waiting for the
results), or require a /monstrous/ amount of POV-Ray architecture changes.
I don't think the POV-Ray dev team /want/ to be bothered (once again)
with this anyway.
And even /if/ they wanted to be bothered, /I/ as a POV-Ray user wouldn't
want them to, because I think there are plenty of features which are
more important to invest time into, where a gain is clearly visible,
while the gain of GPU support is still /very/ questionable, for all the
reasons I (and others) have presented by now.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga schrieb:
>>> was: "are you ABSOLUTELY SURE it won't work...?"
>>
>> No, you didn't even ask a question.
>
> news://news.povray.org:119/4a850da7@news.povray.org, check the last
> paragraph, proves you wrong. Did you liked to be corrected?
Are you actually aware that Chambers and CLipka are two totally
different persons? That there are actually /two/ persons trying to
convince you that OpenCL will get POV-Ray nowhere (with others having
joined in occasionally)?
Maybe you should re-read all the replies you got, and have a look at who
actually wrote them, and what you replied to whom.
And Chambers is right: There wasn't a question to start with. You only
went for a more questioning style some postings into the discussion.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> But if you think about common scenes where you have walls and floors
>> and so forth, it might be worth using the GPU to test ray
>> intersections against these, and against bounding volumes (if you're
>> using them). Huge numers of rays need to be run through these tests,
>> so the GPU can fire those off quite quickly.
>
> But those tests are also quite simple, so would benefit the least from
> the GPU.
How do you figure that?
The benefit of the GPU isn't the complexity of the kernels it executes
(usually quite the converse), it's more the fact that it can do the same
operations many, many times over simultaneously.
> If isosurfaces could be translated efficiently into shaders,
> then those would show the most benefit (and Julia fractals, of course).
No arguments here.
>> It might also be worth running the "so what the hell is the colour of
>> this surface?" calculation on the GPU - what with texturing, normal
>> maps, multiple light rays of different angles and colours, etc.
>
> Possibly. In fact, using the GPU for GI might be an option.
Yes, there's a couple of ways you might implement this, with varying
degrees of accuracy. For example, you could feed the GPU a polygon mesh
approximating the scene. (There are already known algorithms for doing
GI on a polygon mesh.) Or you could just use the GPU to accelerate the
insane number of ray intersection tests required during GI.
>> Also, let's clear this up: My understanding is that the GPU does not
>> require *all* cores to run an identical kernel. IIRC, the cores are
>> grouped into (fairly large) bundles, each bundle runs a single kernel,
>> but different bundles can run completely different kernels. So you
>> don't need a ray queue with enough rays for the entire GPU, just for a
>> whole bundle of cores.
>
> True, I think the shaders are in blocks of 4, and you have to have
> groups of 32 blocks running the same shader or something like that
> (which would be 128 shaders per program). I don't remember the exact
> numbers, though, and in fact it's probably GPU dependent.
Yeah, I'm pretty sure the batch sizes vary by GPU model. But the main
point is, you don't have to assign *all* cores to the same kernel; just
sufficiently large bunches of them.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
>> But those tests are also quite simple, so would benefit the least from
>> the GPU.
>
> How do you figure that?
Because you need to look at the ratio of computation time to
transmission time. The simpler something is, the lower the ratio, and
the less worthwhile offloading is (after all, it messes with the cache,
the bus, the memory system, etc). Ideally, whatever you offload will be
computationally intensive, so the transmission time is justifiable.
> Yeah, I'm pretty sure the batch sizes vary by GPU model. But the main
> point is, you don't have to assign *all* cores to the same kernel; just
> sufficiently large bunches of them.
The key, of course, being "sufficiently large bunches." There's a
strong likelihood that POV wouldn't be able to consistently create
batches that are sufficiently large.
...Chambers
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Chambers wrote:
> The key, of course, being "sufficiently large bunches." There's a
> strong likelihood that POV wouldn't be able to consistently create
> batches that are sufficiently large.
This is the key question, yes.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |