|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 06/09/2011 09:28 PM, Warp wrote:
> Hmm, for the compiler to be able to elide the copying and make the changes
> in-place, it would have to know for sure that those nodes that are being
> changed are not being referenced from anywhere else in the program. (Because,
> obviously, if they are referenced by something else, that something else is
> going to see the change although it shouldn't.)
>
> I'm not sure how a compiler would be able to do that in a complex program
> where nodes and containers are being passed around.
I used to think that the compiler magically performed these kinds of
optimisation somehow. I've gradually learned that this isn't the case.
Oh, sure, if you write one function that puts some data into a box, and
another function that immediately takes the data out of that box again,
the box will probably be optimised away. But for something like updating
a list / tree / whatever, it's usually not feasible to optimise out the
copying.
(There /are/ frameworks designed to spot long chains of container
manipulations and inline them into a giant nested loop. But that's not
compiler magic, that's clever user-level programming. The example I
posted contains no such magic.)
> (Of course the copying is not a big deal if each node contains eg. a
> couple of integers. However, it begins to matter if they contain for example
> large strings, which isn't all that unusual.)
Let us be clear about exactly what is copied:
If you have a heap which contains N internal nodes (since there is only
"one" leaf node, shared by all heaps in the program) and you insert a
new item into that heap, then O(log N) new heap nodes get constructed.
Each [internal] heap node consists of 4 pointers. One points to
something resembling a vtable [that's how you can tell Node and Leaf
apart], one points to the datum, and the other two point to the child nodes.
So copying an internal node amounts to copying (and possibly altering) 4
pointers. This /regardless/ of the type of data the heap holds. Even if
it holds arbitrary-precision integers, you're still only copying 4
pointers. The integers haven't changed, and so are not copied.
It does seem a pity that in the [very common] case of starting from an
empty heap and repeatedly inserting new items into it, you have to keep
continually generating these new nodes and then throwing them away. In
mitigation, the GC is heavily optimised for this activity pattern.
Allocations are cheap. They happen in a per-core allocation area (so no
cache coherency issues), and each allocation is O(1). When the
allocation area fills up, a GC sweep happens. Because the allocation
area is quite small, it can be swept very quickly. The sweep identifies
garbage, frees it, and defragments the free space (so that allocation is
still O(1) in all cases).
It would be faster if you didn't have to do all this work, but it's not
as slow as you'd think.
There are two key scenarios where immutable data becomes a Big Win:
First, there's parallel programming. Immutable data is thread-safe data.
It /cannot/ change while you're examining it, so race conditions are a
non-issue. Obviously if you want to /communicate/ between threads,
that's another matter. But you don't ever have to worry about making one
thread wait to update something until after the other thread has
definitely finished using it. You won't ever see bugs where you mutate
something and then discover that another thread was actually still
reading it.
Second, backtracking search problems. You might think that copying stuff
every time you want to change it is inefficient, but if you're doing
some kind of search problem where you want to search multiple avenues,
then still having the old copy around drastically simplifies the
algorithm, and probably /increases/ efficiency. Because the alternative
is to deduce (or remember) what you did, and then manually undo it if
you need to backtrack.
Some guy at Google wrote a paper about translating a Python (?) system
to Haskell, and actually devoted a whole section to the problems of
trying to enforce immutability in Python, and the difficulties of doing
without it. Immutable data was sited as one of Haskell's biggest advantages.
The one place where immutability /does/ bit is arrays. Almost all
Haskell data structures are linked lists or trees of some kind, because
that way the copying is cheap. But if you update one index of a
one-million index array, you've got to copy a contiguous block of a
million elements. Not fast. Immutable arrays are great as super-fast
lookup tables, but unless the data changes very infrequently, you end up
needing mutable arrays. Immutable arrays just aren't all that useful.
[Insert rant about how Haskell's array implementation sucks...]
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 9/7/2011 2:22, Invisible wrote:
> then O(log N) new heap nodes get constructed.
Which, incidentally, is the same number of nodes you'd have to update in a
mutable heap, too.
> But you don't ever have to worry about making one thread
> wait to update something until after the other thread has definitely
> finished using it.
That's not strictly true. It's entirely possible in modern processors that
the pointer gets written and is visible to other processors before the data
it points to gets flushed from the cache into main memory.
> Immutable arrays are great as super-fast lookup tables, but unless the data
> changes very infrequently, you end up needing mutable arrays. Immutable
> arrays just aren't all that useful.
Some of the google stuff is cool in that way. You can imagine you don't want
to update the entire file that holds the entire web every time you scan a
new document.
--
Darren New, San Diego CA, USA (PST)
How come I never get only one kudo?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 08/09/2011 05:09 AM, Darren New wrote:
> On 9/7/2011 2:22, Invisible wrote:
> > then O(log N) new heap nodes get constructed.
>
> Which, incidentally, is the same number of nodes you'd have to update in
> a mutable heap, too.
Indeed. The only difference is that with Haskell, you *copy* these nodes
rather than simply modify them in-place.
>> But you don't ever have to worry about making one thread
>> wait to update something until after the other thread has definitely
>> finished using it.
>
> That's not strictly true. It's entirely possible in modern processors
> that the pointer gets written and is visible to other processors before
> the data it points to gets flushed from the cache into main memory.
No, that's exactly what the "cache coherence" protocol prevents.
>> Immutable arrays are great as super-fast lookup tables, but unless the
>> data
>> changes very infrequently, you end up needing mutable arrays. Immutable
>> arrays just aren't all that useful.
>
> Some of the google stuff is cool in that way. You can imagine you don't
> want to update the entire file that holds the entire web every time you
> scan a new document.
If defies my comprehension that what Google does is actually physically
possible, but anyway...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 9/8/2011 1:14, Invisible wrote:
> No, that's exactly what the "cache coherence" protocol prevents.
Sure. I'm just saying that saying "it's write-once" doesn't automatically
mean it's thread safe in a modern multi-processor. The generated code
actually has to take some care to tell the processor to flush cache lines
and such.
> If defies my comprehension that what Google does is actually physically
> possible, but anyway...
It kind of boggles my mind too. Just the complexity of the monitoring stuff
is stunning, let alone actually getting work done. :-)
--
Darren New, San Diego CA, USA (PST)
How come I never get only one kudo?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 08/09/2011 06:24 PM, Darren New wrote:
> On 9/8/2011 1:14, Invisible wrote:
>> No, that's exactly what the "cache coherence" protocol prevents.
>
> Sure. I'm just saying that saying "it's write-once" doesn't
> automatically mean it's thread safe in a modern multi-processor. The
> generated code actually has to take some care to tell the processor to
> flush cache lines and such.
According to the documentation I read, this is automatic. As in, the
code doesn't have to take any explicit action at all to ensure that all
processors see a consistent image of main memory. It's just that if
several cores try to access the same area at once, it gets very, very slow.
>> If defies my comprehension that what Google does is actually physically
>> possible, but anyway...
>
> It kind of boggles my mind too. Just the complexity of the monitoring
> stuff is stunning, let alone actually getting work done. :-)
Apparently it's one of the monitoring and load-balancing systems they
translated to Haskell...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 9/8/2011 11:36, Orchid XP v8 wrote:
> According to the documentation I read, this is automatic.
It certainly depends on the CPU and architecture.
http://en.wikipedia.org/wiki/Memory_barrier
You also have to make sure you're not caching something in a register that
you've written out the pointer to.
>> It kind of boggles my mind too. Just the complexity of the monitoring
>> stuff is stunning, let alone actually getting work done. :-)
>
> Apparently it's one of the monitoring and load-balancing systems they
> translated to Haskell...
What is?
--
Darren New, San Diego CA, USA (PST)
How come I never get only one kudo?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 09/09/2011 05:14 PM, Darren New wrote:
> On 9/8/2011 11:36, Orchid XP v8 wrote:
>> According to the documentation I read, this is automatic.
>
> It certainly depends on the CPU and architecture.
Well, yes, I was referring to the only architecture in existence that
people are likely to be writing desktop applications for. :-P
> You also have to make sure you're not caching something in a register
> that you've written out the pointer to.
This is certainly true. That's why C has "volatile", for example.
>>> It kind of boggles my mind too. Just the complexity of the monitoring
>>> stuff is stunning, let alone actually getting work done. :-)
>>
>> Apparently it's one of the monitoring and load-balancing systems they
>> translated to Haskell...
>
> What is?
I saw a paper a while back about how Google (I forget which center)
translated part of the monitoring system from Python to Haskell, and
documented what they did and didn't find beneficial about this.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 9/9/2011 9:57, Orchid XP v8 wrote:
> Well, yes, I was referring to the only architecture in existence that people
> are likely to be writing desktop applications for. :-P
I think even amongst Intel and AMD x64 and x86 architectures, this varies.
I'm pretty sure it's not 100% automatic or there wouldn't be instructions
for causing it to happen.
> I saw a paper a while back about how Google (I forget which center)
> translated part of the monitoring system from Python to Haskell, and
> documented what they did and didn't find beneficial about this.
Ah. Well, there's monitoring out the wazoo here, so I'm quite certain they
didn't rewrite even a majority of it.
--
Darren New, San Diego CA, USA (PST)
How come I never get only one kudo?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 09/09/2011 07:09 PM, Darren New wrote:
> On 9/9/2011 9:57, Orchid XP v8 wrote:
>> Well, yes, I was referring to the only architecture in existence that
>> people are likely to be writing desktop applications for. :-P
>
> I think even amongst Intel and AMD x64 and x86 architectures, this
> varies.
It probably varies on things like the Atom and other embedded stuff.
> I'm pretty sure it's not 100% automatic or there wouldn't be
> instructions for causing it to happen.
There are instructions for loading stuff without caching it, in case you
happen to know that you're only going to access it once. There are
instructions for flushing the cache, if you're about to do a context
switch or you're controlling memory-mapped hardware. But as far as I
know, you don't have to do anything special to get correct
multi-processor behaviour. (Think about it; if you did, every
multi-threaded application ever written would suddenly break when run on
a multi-chip or multi-core setup.)
>> I saw a paper a while back about how Google (I forget which center)
>> translated part of the monitoring system from Python to Haskell, and
>> documented what they did and didn't find beneficial about this.
>
> Ah. Well, there's monitoring out the wazoo here, so I'm quite certain
> they didn't rewrite even a majority of it.
Oh, I'm sure they only translated one system. And it was only Google
Netherlands (or somewhere like that), not the entire company. (Although
it's surprising that different branches would actually do things
differently...)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 9/12/2011 1:13, Invisible wrote:
> (Think about it; if you did, every multi-threaded application ever written
> would suddenly break when run on a multi-chip or multi-core setup.)
Not really. You have no more likelihood of breaking because of lack of
memory barriers than you have of breaking because you've cached something in
a register during a task switch.
The compiler just has to write memory barrier instructions out when you
access a volatile variable. That's why the keyword is there.
--
Darren New, San Diego CA, USA (PST)
How come I never get only one kudo?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|