POV-Ray: Newsgroups: povray.off-topic: GPU rendering

POV-Ray : Newsgroups : povray.off-topic : GPU rendering		Server Time 16 Mar 2026 22:10:51 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Chambers
Subject: Re: GPU rendering
Date: 15 Jan 2010 10:53:13
Message: <4b508f69$1@news.povray.org>

These four criteria have yet to be met:

GPU acceleration will be useful when the following conditions are met:
1) Support for sophisticated branching
2) Full double-precision accuracy
3) Large memory sets (other than textures)
4) Independent shaders running on distinct units.

...Chambers

Post a reply to this message

From: nemesis
Subject: Re: GPU rendering
Date: 15 Jan 2010 11:16:10
Message: <4b5094ca@news.povray.org>

Chambers escreveu:
> I can't recall POV-Ray ever being one of the fastest raytracers around. 

I can recall it being used as a useful benchmarking tool once it got 
multicore.

>  Everyone else is willing to make sacrifices concerning accuracy in 
> order to speed up rendering (for instance, only accepting triangles).

If you use so many triangles so that each are smaller than a screen 
pixel, you lose no more accuracy than any pov primitive.

In any cases, you get it wrong:  the amazing speed up is not due to 
native GPU triangle handling, but faster ray intersection calculations 
thanks to the GPU sheer parallel vector processing.  You can see that in 
smallptGPU where you get perfect math spheres much faster, no meshes in 
sight.

-- 
a game sig: http://tinyurl.com/d3rxz9

Post a reply to this message

From: Invisible
Subject: Re: GPU rendering
Date: 15 Jan 2010 11:34:53
Message: <4b50992d$1@news.povray.org>

Chambers wrote:

> 1) Support for sophisticated branching

When this happens, the GPU will be exactly the same speed as the CPU.

The GPU is fast *because* it doesn't support sophisticated branching.

> 2) Full double-precision accuracy

This already exists apparently. (E.g., my GPU supports double-precision 
math.)

> 3) Large memory sets (other than textures)

My GPU has access to just under 1GB of RAM. How much do you want?

> 4) Independent shaders running on distinct units.

What exactly do you mean by that?

Post a reply to this message

From: Sabrina Kilian
Subject: Re: GPU rendering
Date: 15 Jan 2010 12:57:42
Message: <4b50ac96$1@news.povray.org>

nemesis wrote:
> In any cases, you get it wrong:  the amazing speed up is not due to
> native GPU triangle handling, but faster ray intersection calculations
> thanks to the GPU sheer parallel vector processing.  You can see that in
> smallptGPU where you get perfect math spheres much faster, no meshes in
> sight.
> 

Spheres are the other special, easy, case. Any ray that comes within
-radius- of -center point- must intersect the sphere, with a surface
normal that is pretty easy to compute. From that, the diffuse,
reflection, and so on. The math, as obvious by the very short
intersection code in smallpt, is easy and compact.

Now, for any given cone, with arbitrary rotation and ratio of base to
height, provide a fast general case for the ray intersection test, that
will give you the same information. Single test, just to be clear, not
separate cases for the disc bottom and cone top.

Then cube, then spline lathe, SOR, blobs . . .

Post a reply to this message

From: Sabrina Kilian
Subject: Re: GPU rendering
Date: 15 Jan 2010 13:08:40
Message: <4b50af28@news.povray.org>

Invisible wrote:
> Chambers wrote:
>> 3) Large memory sets (other than textures)
> 
> My GPU has access to just under 1GB of RAM. How much do you want?
> 

They lack virtual ram in a simple sense, last I looked, but you can
manage that by having your CPU thread move things around as needed.
Messy and annoying, when the CPU gives you just under 4 gigs in a simple
32-bit OS as long as the system has the virtual ram space available. And
even more than that in 64-bit, if you ask the OS nicely. On the CPU,
when you need different parts of that 4 gigs of stuff, the OS handles
it. I suspect that, should the GPU need new data fed into memory, it
would require more than just stalling the threads while the data is
moved from ram or paged memory over to the graphics card.

>> 4) Independent shaders running on distinct units.
> 
> What exactly do you mean by that?

I am guessing that he meant the ability to run multiple threads on the
GPU, instead of running 100 instances of the same thread with slightly
different starting conditions. I know that my card supports 4 different
work groups, split over 8 'processors' each.

Post a reply to this message

From: nemesis
Subject: Re: GPU rendering
Date: 15 Jan 2010 13:11:11
Message: <4b50afbf$1@news.povray.org>

Sabrina Kilian escreveu:
> nemesis wrote:
>> In any cases, you get it wrong:  the amazing speed up is not due to
>> native GPU triangle handling, but faster ray intersection calculations
>> thanks to the GPU sheer parallel vector processing.  You can see that in
>> smallptGPU where you get perfect math spheres much faster, no meshes in
>> sight.
>>
> 
> Spheres are the other special, easy, case. Any ray that comes within
> -radius- of -center point- must intersect the sphere, with a surface
> normal that is pretty easy to compute. From that, the diffuse,
> reflection, and so on. The math, as obvious by the very short
> intersection code in smallpt, is easy and compact.
> 
> Now, for any given cone, with arbitrary rotation and ratio of base to
> height, provide a fast general case for the ray intersection test, that
> will give you the same information. Single test, just to be clear, not
> separate cases for the disc bottom and cone top.
> 
> Then cube, then spline lathe, SOR, blobs . . .

It's an example that the speedup is independent of the native GPU 
triangle handling, nothing more.

I'm not math competent at all to give an answer to that.

-- 
a game sig: http://tinyurl.com/d3rxz9

Post a reply to this message

From: Orchid XP v8
Subject: Re: GPU rendering
Date: 15 Jan 2010 13:21:10
Message: <4b50b216$1@news.povray.org>

>>> 3) Large memory sets (other than textures)
>> My GPU has access to just under 1GB of RAM. How much do you want?
>>
> 
> They lack virtual ram in a simple sense, last I looked, but you can
> manage that by having your CPU thread move things around as needed.
> Messy and annoying, when the CPU gives you just under 4 gigs in a simple
> 32-bit OS as long as the system has the virtual ram space available. And
> even more than that in 64-bit, if you ask the OS nicely. On the CPU,
> when you need different parts of that 4 gigs of stuff, the OS handles
> it. I suspect that, should the GPU need new data fed into memory, it
> would require more than just stalling the threads while the data is
> moved from ram or paged memory over to the graphics card.

You people must render something utterly different to what I render! I 
think the most demanding scene I ever ran wanted 10MB of RAM or something...

>>> 4) Independent shaders running on distinct units.
>> What exactly do you mean by that?
> 
> I am guessing that he meant the ability to run multiple threads on the
> GPU, instead of running 100 instances of the same thread with slightly
> different starting conditions. I know that my card supports 4 different
> work groups, split over 8 'processors' each.

The information I got is that the GPU runs "bunches" of threads on its 
several-hundred cores, and all the threads in a "bunch" must be 
identical [just operating on different data], but different bunches can 
run utterly different code...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Sabrina Kilian
Subject: Re: GPU rendering
Date: 15 Jan 2010 13:31:05
Message: <4b50b469$1@news.povray.org>

Orchid XP v8 wrote:
> You people must render something utterly different to what I render! I
> think the most demanding scene I ever ran wanted 10MB of RAM or
> something...
> 

I was wrapping an object in wire using a sphere sweep. While it was just
a minor background detail, I had several billion spheres. In my defense,
I was also doing this just to find the point at which I ran out of memory.

>>>> 4) Independent shaders running on distinct units.
>>> What exactly do you mean by that?
>>
>> I am guessing that he meant the ability to run multiple threads on the
>> GPU, instead of running 100 instances of the same thread with slightly
>> different starting conditions. I know that my card supports 4 different
>> work groups, split over 8 'processors' each.
> 
> The information I got is that the GPU runs "bunches" of threads on its
> several-hundred cores, and all the threads in a "bunch" must be
> identical [just operating on different data], but different bunches can
> run utterly different code...
> 

In general, though, what is the smallest maximum number of "bunches"
that you can run? I know my card lists 4 work groups, I do not know if
any other cards currently being sold only allow 2, or possibly only 1,
bunch to be run at once.

Post a reply to this message

From: nemesis
Subject: Re: GPU rendering
Date: 15 Jan 2010 13:54:17
Message: <4b50b9d9$1@news.povray.org>

Orchid XP v8 escreveu:
> You people must render something utterly different to what I render! I 
> think the most demanding scene I ever ran wanted 10MB of RAM or 
> something...

I'm sure even a game like Crysis needs more memory than your pov RSOCPs.

-- 
a game sig: http://tinyurl.com/d3rxz9

Post a reply to this message

From: Orchid XP v8
Subject: Re: GPU rendering
Date: 15 Jan 2010 13:54:20
Message: <4b50b9dc$1@news.povray.org>

>> The information I got is that the GPU runs "bunches" of threads on its
>> several-hundred cores, and all the threads in a "bunch" must be
>> identical [just operating on different data], but different bunches can
>> run utterly different code...
>>
> 
> In general, though, what is the smallest maximum number of "bunches"
> that you can run? I know my card lists 4 work groups, I do not know if
> any other cards currently being sold only allow 2, or possibly only 1,
> bunch to be run at once.

I'll check the CUDA manual. Obviously this stuff varies by GPU, but I 
was under the impression my card handles something like 16 bunches at 
once or so.

Certainly you can't currently run thousands of seperate tasks at once. 
But then, my card (for example) has something like 300 hardware units. 
That's a hell of a lot more than 4 CPU cores...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>