 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Stephen wrote:
> On Fri, 14 Aug 2009 03:32:15 -0400, Saul Luizaga <sau### [at] netscape net> wrote:
>
>> And you know what I expect the POV/TAG-Team to know...
>
> Is there still a TAG team? I know Warp is still around and Gilles and Ken pop in
> occasionally but when was the last time any of the others replied to a post?
No idea man, I assume there is a TAG-Team since I don't think a highly
technical freeware like this can't exist without one.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga <sau### [at] netscape net> wrote:
> groaning is an emotional response and as such, irrational: I haven't
> been reading this NG for a long time and the first line of text I read
> was your groaning, besides of the rudeness, it explains nothing, and
> clarifies the same way.
There are certain subjects which repeat themselves time after time when
some people think that they have an ingenious new idea which surely nobody
has even thought of before and thus they come here and write about it. For
the umpteenth time for regulars. Something which has already been discussed
like a million times to death. No wonder regulars are tired of explaining
the same thing over and over.
At some point in the past, whem XML was all the hype, it was a rather
regular occurrence for someone to come here and suggest that povray's
scene description language would be changed to XML-conforming. No wonder
that after some time people just start responding to it with "no, that's
just a braindead idea" rather than going once again to minute details why
the idea doesn't work.
To be fair, though, not all such ideas are unimplementable. The most
prominent example is multithreading: In the past it was again and again
suggested, and again and again shot down as unfeasible due to all the
povray features which are not thread-friendly. In a way, both views were
right: Yes, multithreading *is* implementable in povray (as demonstrated),
and yes, it *is* a huge, huge task (as also demonstrated). Not only did it
require an almost complete refactoring of the source code, but even after
all these years there are still minor problems to be solved because of the
problems introduced by multithreading.
Using the GPU for rendering in povray is equally unfeasible, even though
for slightly different reasons. Mostly it has to do with GPU features (or
lack thereof) and the amount of data which would have to be constantly
transferred between the graphics card and the system RAM, which would most
probably nullify any theoretical speed advantage.
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Chambers wrote:
> Of course, modern GPUs now allow double precision, so we can get to the
> other objections now. Specifically:
>
> 1) Recursion. As clipka (Christian?) wrote, it is absolutely essential
> for POV.
>
> 2) Data parallelization versus code parallelization (this is related to
> the first, but is not strictly the same).
>
> The ray tracing algorithm follows drastically different code branches on
> a single set of data, based on recursion (reflections & refractions), as
> well as the other various computations needed (texture calculation,
> light source occlusion, etc) which almost all need access to the entire
> scene.
There are two problems: recursion and divergence.
When a ray hits something, zero or more secondary rays are spawned. On
the CPU, this is usually just a recursive function call, but the GPU
does not permit such a thing.
Also, a GPU consists of *hundreds* of cores, but they must all execute
the same code path (but with different data). You can set the GPU up to
process multiple rays, but as soon as some of the rays hit object A but
others hit object B, the code paths that need to be taken diverge from
each other, which the GPU does not permit.
The solution in both cases is to put rays into "queues", such that all
the rays in a given queue take the same code path [for a while]. When
you need to spawn a secondary ray, you add it to a queue rather than
recursively tracing it. When some rays hit an object and others don't,
you add them to different queues. The rays in each queue can then be
processed in batches later.
The key problem is that if a queue ends up with very few rays in it,
you're going to have a hell of a lot of idle cores while you process
that queue. The GPU is usually clocked far slower than the CPU; it only
"appears" fast because it has hundreds of cores working in parallel. If
most of those cores are actually idling, you're going to have a problem.
It may turn out not to be any faster than the CPU under unfavourable
conditions.
Another possibility is to run the main renderer on the CPU, adding rays
to queues, and sending any "sufficiently large" queues to the GPU for
processing. I don't know if bandwidth limitations between the two would
make this viable...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
In article <4a84d512@news.povray.org>,
Saul Luizaga wrote:
> clipka wrote:
>> Saul Luizaga schrieb:
>> (*groans*)
>
> Way to go to start a discussion...
LOL!
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
I understand perfectly this problems.
> Another possibility is to run the main renderer on the CPU, adding rays
> to queues, and sending any "sufficiently large" queues to the GPU for
> processing. I don't know if bandwidth limitations between the two would
> make this viable...
Exactly, that is why I asked: "Are absolutely sure there isn't a case
where a GPU can help? maybe in the middle of a rendering/parsing?".
As for the bandwidth and memory concerns, from here:
http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=95060&enterthread=y
Q: What are AMD's stream computing product features?
A: AMD's FireStreamtm 9170, our latest generation stream computing GPU,
features:
* 320 stream cores (compute units or ALUs)
* 2GB on-board GDDR3 memory
* Double precision floating point support
* PCIe 2.0 x16 interface
View AMD FireStream 9170 specifications here:
http://ati.amd.com/products/streamprocessor/specs.html
Memory Concern:
--------------
Maybe would be a good idea to leave the processed data on the video
card local memory until is needed in Main Memory.
Bandwidth Concern:
-----------------
- M4A78 PLUS MoBo
(http://usa.asus.com/products.aspx?l1=3&l2=149&l3=758&l4=0&model=2889&modelmenu=1):
# It features dual-channel DDR2 1066 memory support and accelerates data
(http://www.amd.com/us/products/desktop/processors/phenom-ii/Pages/phenom-ii-key-architectural-features.aspx)
# One 16-bit link at up to 4000MT/s
HyperTransport Generation 3.0 mode
# Up to 37GB/s total delivered processor-to-system bandwidth
(HyperTransport bus + memory bus)
PCIe Card Electromechanical 2.0 Specification
(http://www.pcisig.com/specifications/pciexpress/base2)
# Signaling
PCI Express Base 2.0 specification doubles the interconnect bit rate
from 2.5 GT/s to 5 GT/s in a seamless and compatible manner. The
performance boost to 5 GT/s is by far the most important feature of the
PCI Express 2.0 specifications. It effectively increases the aggregate
bandwidth of a 16-lane link to approximately 16 GB/s.
- Video Card: MD FireStream 9170 (specs above)
----------------- ************** ----------------
As you can see, maybe bandwidth it isn't much of an issue since The
transfer between the PCIe video card and the Main memory can me made at
5 GT/s. Is this still insufficient for POV-Ray peak performance?
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> Saul Luizaga <sau### [at] netscape net> wrote:
>> groaning is an emotional response and as such, irrational: I haven't
>> been reading this NG for a long time and the first line of text I read
>> was your groaning, besides of the rudeness, it explains nothing, and
>> clarifies the same way.
>
> There are certain subjects which repeat themselves time after time when
> some people think that they have an ingenious new idea which surely nobody
> has even thought of before and thus they come here and write about it. For
> the umpteenth time for regulars. Something which has already been discussed
> like a million times to death. No wonder regulars are tired of explaining
> the same thing over and over.
Well instead of groaning you can make a small .txt file in your PC:
"Alrady discussed, conclusions were:
1)....
2)...
3)..."
or something like that, to avoid frustration and redundancy.
I don't think my ideas are revolutionary, nor new, nor ingenious, I'm
just suggesting something that MAY or MAY NOT have not been discussed
before.
Also I assume everyone here knows more than me, including the
POV/TAG-Team, so this is more of a hint than a suggestion. Sometimes
smart people forget about simple things.
> At some point in the past, whem XML was all the hype, it was a rather
> regular occurrence for someone to come here and suggest that povray's
> scene description language would be changed to XML-conforming. No wonder
> that after some time people just start responding to it with "no, that's
> just a braindead idea" rather than going once again to minute details why
> the idea doesn't work.
>
> To be fair, though, not all such ideas are unimplementable. The most
> prominent example is multithreading: In the past it was again and again
> suggested, and again and again shot down as unfeasible due to all the
> povray features which are not thread-friendly. In a way, both views were
> right: Yes, multithreading *is* implementable in povray (as demonstrated),
> and yes, it *is* a huge, huge task (as also demonstrated). Not only did it
> require an almost complete refactoring of the source code, but even after
> all these years there are still minor problems to be solved because of the
> problems introduced by multithreading.
I see, I know POV-Ray source code is HUGE and any minor changes
represent big efforts. But, at the seems, in this case was a necessary one.
> Using the GPU for rendering in povray is equally unfeasible, even though
> for slightly different reasons. Mostly it has to do with GPU features (or
> lack thereof) and the amount of data which would have to be constantly
> transferred between the graphics card and the system RAM, which would most
> probably nullify any theoretical speed advantage.
>
Maybe there is a use for it, not as another main processor but as
secondary one. I posted about it in another post.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
:-D
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
I mean, the video card has 2GB GDDR3 of RAM and if a suitable threads
are found for the GPU and all that work can be leaved in the video card
memory until is needed.
Maybe, even the data in the video card memory could be used to give a
partial rough preview as the scene is rendered, wouldn't be any clear
sometimes but it could give you a hint what the GPU is doing on the fly,
at least I think it would be cool to see it. Probably even won't be too
much of a delay to display this since is already in he video card. Of
course this is very optional.
Also I was wondering: what is the bandwidth between the CPU and the Main
Memory at render time? maybe this can help us calculate a rough estimate
of the bandwidth needed for the video card.
Cheers.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga schrieb:
>> 1) Recursion. As clipka (Christian?) wrote, it is absolutely
>> essential for POV.
>
> I suppose this is unsolvable without an C++ ported OpenCL.
???
Recursion is not a feature of C++, it is also a part of standard C99.
>> 2) Data parallelization versus code parallelization (this is related
>> to the first, but is not strictly the same).
>
> they say "an API for coordinating data and task-based parallel
> computation...", this doesn't help? If it could do both maybe would be
> of use for POV-Ray.
Did you actually /read/ the spec - or just the enthusiastic introduction?
Sure, it does support task-based parallel computation - why? Probably
because it also targets classic multi-core CPUs, which are ideally
suited to task-based parallel computing.
GPUs perform very poorly with task-based parallel computations, due to
their hardware architecture. A software abstraction layer won't change
that fundamental limitation.
> I see... maybe if GPGPUs are not use as co-processors but as an
> auxiliary co-processor that is called on demand, if GPU compliant
> procedure needs to be processed. Are absolutely sure there isn't a case
> where a GPU can help? maybe in the middle of a rendering/parsing?
No.
POV-Ray's internal workflow does not support asynchronous computations
(other than having multiple threads independently render parts of the
image), so only blocking "calls" to the GPU would be of any use, putting
the CPU task in waiting state in the meantime. Therefore, only portions
of the code that can be computed /significantly/ faster by the GPU, or
have any /significant/ size, would warrant "outsourcing" of
computations, otherwise parameter passing and task switching overhead
would bog down performance instead of increasing it.
But the only sections of POV-Ray code that do ask for parallelization
are RGB color and 3D vector computations, or similarly-sized problems;
these can be parallelized quite well on modern CPUs as well using SSE2
(i.e. the GPU will not be much faster), and are heavily intermixed with
conditional branching (i.e. the size of outsourceable work packages is
very small)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Saul Luizaga schrieb:
>> Another possibility is to run the main renderer on the CPU, adding
>> rays to queues, and sending any "sufficiently large" queues to the GPU
>> for processing. I don't know if bandwidth limitations between the two
>> would make this viable...
>
> Exactly, that is why I asked: "Are absolutely sure there isn't a case
> where a GPU can help? maybe in the middle of a rendering/parsing?".
Note that although the approach /may/ (!) work, it is a /fundamentally/
different approach from what POV-Ray is doing.
Changing POV-Ray to use that approach would imply virtually a complete
rewrite of the render engine.
> As you can see, maybe bandwidth it isn't much of an issue since The
> transfer between the PCIe video card and the Main memory can me made at
> 5 GT/s. Is this still insufficient for POV-Ray peak performance?
So you're looking at peak data transfer rate limits and from them can
infer that transfer between CPU and GPU memory space is not an issue?
Did you consider latency issues, or the overhead created by the OpenCL
framework itself? How about the latency for a "function call"?
If your work packages are large enough, then these are no issues. But in
a raytracer, be prepared for rather small work packages.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |