|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Chambers wrote:
> Yes, I'm aware of different designs for mutexes. In fact, the fastest
> way to deal with them is to design your container such that they aren't
> even needed. However, the only paper I've seen that said such a thing
> advocated using immutable state instead.
Look for papers on lock-free containers.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Chambers <ben### [at] pacificwebguycom> wrote:
> That is, once a variable is created it cannot be modified. In the case
> of a container, you don't actually change the container; you make a
> copy, set the reference to point to the modified data, and dump the
> original data.
How exactly does that help? You still have to mutex the copying to
avoid two threads doing it simultaneously. You also have to mutex reading
the data container because you have to avoid reading it while another
thread is copying it (ie. effectively changing its contents, which would
mean that the first thread would get old data, even assuming it won't
access freed memory at any point).
> Would it be faster if different threads rarely accessed the same locked
> data?
AFAIK the problem with POV-Ray's radiosity is that if all threads don't
have the exact same data, you may get different lighting in different
threads (effectively, you will get squares with slightly different
lightning). If one thread adds samples to the container, other threads
must see it immediately.
That, of course, introduces another problem: If two threads would want
to sample the same location, only one of them should do it. The other one
should wait for it to finish it (it wouldn't make too much sense for two
threads to calculate the exact same thing), and then they can both use the
same samples. This adds an additional level of complexity.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Christian Froeschlin <chr### [at] chrfrde> wrote:
> I don't know the pthread library well, but are you sure its
> lock function works by polling? I'd have expected it to wait
> for a signal sent by the unlock function and Thread 1 to
> get awakened before Thread 2 can lock again.
You still need a list of threads waiting for the lock to become available.
You might have several threads waiting in line for the lock, and you can't
send a signal to all of them. You should send the signal to the one which
has been waiting the longest. IOW, you need a FIFO.
> Also, why should Thread 2 have a higher priority anyway? It
> may feel unfair for it to have to wait all the time, but if
> both threads lock the resource too frequently one of them
> is bound to be waiting all the time, so the throughput
> is not helped by fairness.
Because if you don't distribute the lock evenly, you may get uneven load,
and the benefit of multithreading will suffer.
For example, suppose you have a memory pool from which threads can
allocate and deallocate objects for whatever task they are performing.
If one thread is constantly obstructing another thread from allocating
objects from the pool (as might in theory happen in the way I described),
you will effectively have a single-threaded program rather than a
two-threaded one, because one of the threads will be idle (or seriously
underutilized) because its access to the memory pool is constantly being
blocked by the other thread.
If the second thread happens to get access to the memory pool by chance,
the situation may reverse itself (iow. now it's the first thread who can't
get to the memory pool), but the end result is the same.
(Granted, in practice it would be quite unlikely for a situation like
this to happen even with a naive mutex, especially if the threads
allocate and deallocate objects from the memory pool relatively
infrequently. However, the more frequently they do it, the more likely
it is that they will start obstructing each other, decreasing the overall
performance of the program.)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thorsten Froehlich <tho### [at] trfde> wrote:
> Warp wrote:
> > As a curiosity, I tested the actual slowdown caused by Boost mutexes.
> Sorry, but this is a fairly baseless claim because there are no "Boost"
> mutexes. Boost provides nothing more than a very low-overhead (zero overhead
> if using an optimizing compiler) wrapper over Posix mutexes on Unix and
> Win32 mutexes on Windows. As such, "Boost mutexes" cannot be slower than the
> native operating system's mutexes because they *are* nothing more than the
> native operating system's mutexes!
One would think so, but one would be wrong.
If in the example program I mentioned I change the Boost mutex to a
Pthreads mutex, the running time of the program decreases from 47 seconds
to 41 seconds. I measured both versions several times, and the times were
completely consistent, with negligible variation (in the order of plus/minus
0.1 seconds).
The difference is small, but it exists. The "thin" wrapper in Boost does
cause its small overhead.
(I read somewhere that newer versions of Boost fix this small problem
by resorting to more inlining. It might be that I have a version of boost
which does not inline as much.)
OpenMP mutexes are slightly faster than Pthreads mutexes, at least here.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> allocate and deallocate objects from the memory pool relatively
> infrequently. However, the more frequently they do it, the more likely
> it is that they will start obstructing each other, decreasing the overall
> performance of the program
However, if they do it frequently, one of the threads will always
be blocked. Using priorities will therefore prevent the situation
that thread 1 works at 100% and thread2 at 0%, and transform it
into thread 1 works at 50% and thread2 at 50%. While this is
better for responsiveness if one of the thread handles GUI
events or similar, it doesn't help overall performance for
parallelized number crunching as in POV-Ray.
An exception would be if thread 1 needs the lock often but
thread 2 only rarely. But I expect both threads to be doing
essentially the same task in parallel for speedup, so they
will also habe the same locking requirements.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Christian Froeschlin <chr### [at] chrfrde> wrote:
> However, if they do it frequently, one of the threads will always
> be blocked. Using priorities will therefore prevent the situation
> that thread 1 works at 100% and thread2 at 0%, and transform it
> into thread 1 works at 50% and thread2 at 50%.
Only in a 1-processor/single-core system.
In a dual-core system the goal is to get as close to 100% and 100% as
possible. If the threads block each other from the resource, you'll get
100% and 0% (on the very extreme), so one of the cores will just sit
idle while the other does all the work.
(Yes, it's quite unlikely to get to this extreme even with naive
mutexes, but in theory it's still possible. In practice you may still
get reduced efficiency in certain situations, and a FIFO would mostly
solve the problem.)
> An exception would be if thread 1 needs the lock often but
> thread 2 only rarely. But I expect both threads to be doing
> essentially the same task in parallel for speedup, so they
> will also habe the same locking requirements.
Which is why it's important to distribute the resources to all
threads evenly.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp <war### [at] tagpovrayorg> wrote:
> The reason why Boost is used is that it's portable to most systems.
> Pthreads can only be used in posix systems which implement it (if I'm
> not mistaken, most Windows compilers don't). OpenMP is a rather new
> standard and few compilers support it.
So maybe on Posix systems boost just wraps the Pthreads mutexes, thereby adding
some more overhead? Or is the difference much worse?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Chambers" <ben### [at] pacificwebguycom> wrote:
> Would it be faster if different threads rarely accessed the same locked
> data? We could split the sample tree into several smaller trees
> (sub-trees, if you will) based on scene geography (grouping close
> samples in the same tree). Then, when performing sampling, a thread
> would be most likely to only need access to a few of the sub-trees
> (those nearest), and rarely the sub-trees further away. If you only
> have a few threads, then the chances of multiple threads needing the
> same sub-tree at the same time are minimized. Having more threads would
> add more overhead, of course, because thread collision is more likely.
Doesn't work I guess. Think of an ordinary room: For most samples, you will need
samples from all across the room, not just a particular corner.
There may be scenes where your idea might speed up things, but in general
radiosity interaction between different parts of the scene can be expected to
be quite high.
The approach of subdividing an image, and feeding each part to a separate task,
works with regular raytracing only because even if parts of the scene interact
through reflection, there is no data cache being built and re-used - because
the chances of two mirrors showing exactly the same points are minimal (let
alone that even then the incident angles would be different) and so caching
wouldn't be worth the pain anyway.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Chambers" <ben### [at] pacificwebguycom> wrote:
> > -----Original Message-----
> > From: Warp [mailto:war### [at] tagpovrayorg]
> > Chambers <ben### [at] pacificwebguycom> wrote:
> > > Boost isn't slow. Mutexing is slow (relatively).
> >
> > Boost mutexes are actually the slowest I have tried, although not by
> > a
> > very large margin.
>
> > (interesting stuff about mutexing snipped)
Indeed. :)
> Yes, I'm aware of different designs for mutexes. In fact, the fastest
> way to deal with them is to design your container such that they aren't
> even needed. However, the only paper I've seen that said such a thing
> advocated using immutable state instead.
>
> That is, once a variable is created it cannot be modified. In the case
> of a container, you don't actually change the container; you make a
> copy, set the reference to point to the modified data, and dump the
> original data.
>
> I was so impressed that I implemented a few tests of the idea just to
> get a feel for it. It CAN be extremely fast, and threading CAN work
> effectively with it, but generally it's a pain in the a** and, if done
> wrong, you can screw up your data or get a big slowdown.
It's a pain in the ass to try to replicate functional programming in heavily
imperative languages, yes. I'm sure Haskell would deal with it much more
gracefully. ;)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"nemesis" <nam### [at] gmailcom> wrote:
> It's a pain in the ass to try to replicate functional programming in heavily
> imperative languages, yes. I'm sure Haskell would deal with it much more
> gracefully. ;)
As POV-ray is unfortunately(???) *not* programmed in Haskell, and most likely
will not be in near future, your point is a bit moot ;)
(I'd be a bit curious though to see how a Haskell-based raytracer would perform
overall, compared with POV 3.6 (C), or the current beta (C/C++ in every sense
of the slash)...)
I heard tell that Erlang is a good language to write distributed code, too...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|