|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"clipka" <nomail@nomail> wrote:
> Presuming that having more samples than actually required is not an issue, then
> how about this one:
>
> [...]
>
> I'm also thinking about totally different approaches, although I can't see yet
> how they could be put to good use. One basic idea would be to have a single
> main pretracer thread that would find out what samples are needed, and have
> other threads actually collect them.
>
> One way this could be done would be as follows:
>
> [...]
Some serious feedback on this anyone, before it gets swamped in posts about
mutexes?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
>>However, if they do it frequently, one of the threads will always
>>be blocked. Using priorities will therefore prevent the situation
>>that thread 1 works at 100% and thread2 at 0%, and transform it
>>into thread 1 works at 50% and thread2 at 50%.
>
> Only in a 1-processor/single-core system.
Also on a dual-core system on worst-case locking (both threads
continually lock and unlock mutexes without doing anything else).
Admittedly not useful but also the only scenario when you get
thread 2 to really starve, which you initally worried about.
The only way to get both cores close to 100% is to minimize
the time spent in locked mutexes, so it becomes the exception
rather than the rule that on thread waits for the other. But
that works equally well with unprioritized mutexes.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka <nomail@nomail> wrote:
> So maybe on Posix systems boost just wraps the Pthreads mutexes, thereby adding
> some more overhead? Or is the difference much worse?
While Boost uses Pthreads in posix-compliant systems, according to my
tests its thin layer over it adds a tiny amount of overhead.
OTOH I have also read somewhere that this small problem is (or is going
to be) fixed in newer versions of Boost by moving more code into inline
functions (thus effectively removing any overhead over calling Pthreads
directly).
Anyways, overall it doesn't make too much of a difference. The basic
slowness of mutexes is the same.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> I heard tell that Erlang is a good language to write distributed code, too...
It's interpreted and dynamically typed, so it's certainly not something you
want to try to write *fast* distributed code in. Erlang is distributed for
reliability, not speed. (And, FWIW, it's known to suck at floating point
ops, too.)
--
Darren New, San Diego CA, USA (PST)
The NFL should go international. I'd pay to
see the Detroit Lions vs the Roman Catholics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> The approach of subdividing an image, and feeding each part to a separate task,
> works with regular raytracing only because even if parts of the scene interact
> through reflection, there is no data cache being built and re-used - because
> the chances of two mirrors showing exactly the same points are minimal (let
> alone that even then the incident angles would be different) and so caching
> wouldn't be worth the pain anyway.
An alternative approach would be to simply give each thread its
own cache. The threads will redo work done by other threads but it
may actually be less costly than locking stuff. It will be simple
to implement and faster than single-threaded pretrace unless
building the cache takes 100% of pretrace time. It removes
race conditions at least for individual pixels.
The problem remains that the image may look different on each
render if the assignment of image blocks to threads is dynamic
(load-balanced). This could be fixed by using a static block
assignment pattern for radiosity scenes (not too terrible, it
is rather unlikely that one thread gets all the slow blocks).
Unfortunately, the resulting image would then still depend
on the number of threads which were actually used, so anyone
who whished to exactly reproduce a scene would have to use
the same thread count.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New <dne### [at] sanrrcom> wrote:
> It's interpreted and dynamically typed, so it's certainly not something you
> want to try to write *fast* distributed code in. Erlang is distributed for
> reliability, not speed. (And, FWIW, it's known to suck at floating point
> ops, too.)
Interestingly though, someone actually wrote a 3D modeller in that language.
However, this explains why Wings3D goes down to its knees when handling larger
models... which is a pity, because the user interface is exceptionally good.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> Darren New <dne### [at] sanrrcom> wrote:
>> It's interpreted and dynamically typed, so it's certainly not something you
>> want to try to write *fast* distributed code in. Erlang is distributed for
>> reliability, not speed. (And, FWIW, it's known to suck at floating point
>> ops, too.)
>
> Interestingly though, someone actually wrote a 3D modeller in that language.
Yep. And it's not distributed, either. It uses all one thread, too. Although
there are features for recompiling it and restarting it without actually
closing the model you're working on.
> However, this explains why Wings3D goes down to its knees when handling larger
> models...
That and being "functional" (rather than functional without the quotes, like
Haskell), meaning in Erlang you have to copy any data structure you want to
change, without actually being functional enough to make it possible for the
compiler to optimize based on immutability of variables.
--
Darren New, San Diego CA, USA (PST)
The NFL should go international. I'd pay to
see the Detroit Lions vs the Roman Catholics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> -----Original Message-----
> From: clipka [mailto:nomail@nomail]
> Some serious feedback on this anyone, before it gets swamped in posts
> about
> mutexes?
Yeah.
From what I understand, POV does a pre-trace to precompute a sample
tree. Then, during the actual, render, POV creates any additional
samples needed.
What if there were a way, during the pretrace, to predict where
additional samples would be needed? If so, then all the samples could
be taken during the pretrace, with no additional sampling during the
final render, with no loss in quality.
...Ben Chambers
www.pacificwebguy.com
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Chambers" <ben### [at] pacificwebguycom> wrote:
> From what I understand, POV does a pre-trace to precompute a sample
> tree. Then, during the actual, render, POV creates any additional
> samples needed.
>
> What if there were a way, during the pretrace, to predict where
> additional samples would be needed? If so, then all the samples could
> be taken during the pretrace, with no additional sampling during the
> final render, with no loss in quality.
Yes, this would solve the multitasking issues (talking about 100%
reproducibility of shots) during main render.
Unfortunately, I know of only two approaches to collect all the samples needed
before the main render:
(1) Just plain stupidly cover all the surfaces in the whole scene with samples -
which will be a problem for scenes that stretch into infinity.
(2) Perform the full render twice.
The problem is that unless you really *do* the final render, you never know
which regions of the shot may ultimately be reached by some stray ray.
To give a trivial example: If you turn on focal blur, you will find that the
final render will take more radiosity samples - especially in the very first
and very last lines. That's because regions normally just outside the visible
region now need to be mixed in.
Another example are small or strongly-curved reflecting details that pretrace
may not have encountered.
So if we want to really do a complete pretrace, we end up taking the full-detail
shot twice - once with radiosity sampling on, and once without further sampling.
And what's worse: The first pass will be the more expensive one. And it seems we
want to do it in a single task to avoid the trouble already mentioned.
And what on earth are we doing the final render for then?
I don't really think this gets us anywhere :)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Christian Froeschlin <chr### [at] chrfrde> wrote:
> An alternative approach would be to simply give each thread its
> own cache. The threads will redo work done by other threads but it
> may actually be less costly than locking stuff. It will be simple
> to implement and faster than single-threaded pretrace unless
> building the cache takes 100% of pretrace time. It removes
> race conditions at least for individual pixels.
Yup, thought of it too.
> Unfortunately, the resulting image would then still depend
> on the number of threads which were actually used, so anyone
> who whished to exactly reproduce a scene would have to use
> the same thread count.
Not necessarily so. If the image is *always* divided into the same number of
blocks, *always* using separate trees for any two blocks even if they are
processed by the same thread, and *always* merging the trees in the same order,
then we do have 100% reproducibility regardless of thread count.
If in addition we have each thread render only one pretrace pass at a time, then
consolidate all collected samples into one tree before proceeding with the next
pretrace pass, we also reduce the number of surplus samples taken.
In the first pretrace pass, the probability is quite low that two tasks need
samples so close together that one of them would actually be not needed.
In the second pretrace pass however, the first pass will already have generated
quite a number of "deeper-bounce" samples to start with.
This brings me to some *GOOD* idea... hey, thanks for the inspiration!
(I'll go into detail in a separat thread... this one has accumulated too much
"spin-offs" for my taste.)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|