POV-Ray: Newsgroups: povray.off-topic: My hypothesis

POV-Ray : Newsgroups : povray.off-topic : My hypothesis		Server Time 28 Oct 2025 11:35:55 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Invisible
Subject: Re: My hypothesis
Date: 7 Sep 2011 05:22:28
Message: <4e6737d4$1@news.povray.org>

On 06/09/2011 09:28 PM, Warp wrote:

>    Hmm, for the compiler to be able to elide the copying and make the changes
> in-place, it would have to know for sure that those nodes that are being
> changed are not being referenced from anywhere else in the program. (Because,
> obviously, if they are referenced by something else, that something else is
> going to see the change although it shouldn't.)
>
>    I'm not sure how a compiler would be able to do that in a complex program
> where nodes and containers are being passed around.

I used to think that the compiler magically performed these kinds of 
optimisation somehow. I've gradually learned that this isn't the case.

Oh, sure, if you write one function that puts some data into a box, and 
another function that immediately takes the data out of that box again, 
the box will probably be optimised away. But for something like updating 
a list / tree / whatever, it's usually not feasible to optimise out the 
copying.

(There /are/ frameworks designed to spot long chains of container 
manipulations and inline them into a giant nested loop. But that's not 
compiler magic, that's clever user-level programming. The example I 
posted contains no such magic.)

>    (Of course the copying is not a big deal if each node contains eg. a
> couple of integers. However, it begins to matter if they contain for example
> large strings, which isn't all that unusual.)

Let us be clear about exactly what is copied:

If you have a heap which contains N internal nodes (since there is only 
"one" leaf node, shared by all heaps in the program) and you insert a 
new item into that heap, then O(log N) new heap nodes get constructed. 
Each [internal] heap node consists of 4 pointers. One points to 
something resembling a vtable [that's how you can tell Node and Leaf 
apart], one points to the datum, and the other two point to the child nodes.

So copying an internal node amounts to copying (and possibly altering) 4 
pointers. This /regardless/ of the type of data the heap holds. Even if 
it holds arbitrary-precision integers, you're still only copying 4 
pointers. The integers haven't changed, and so are not copied.

It does seem a pity that in the [very common] case of starting from an 
empty heap and repeatedly inserting new items into it, you have to keep 
continually generating these new nodes and then throwing them away. In 
mitigation, the GC is heavily optimised for this activity pattern. 
Allocations are cheap. They happen in a per-core allocation area (so no 
cache coherency issues), and each allocation is O(1). When the 
allocation area fills up, a GC sweep happens. Because the allocation 
area is quite small, it can be swept very quickly. The sweep identifies 
garbage, frees it, and defragments the free space (so that allocation is 
still O(1) in all cases).

It would be faster if you didn't have to do all this work, but it's not 
as slow as you'd think.

There are two key scenarios where immutable data becomes a Big Win:

First, there's parallel programming. Immutable data is thread-safe data. 
It /cannot/ change while you're examining it, so race conditions are a 
non-issue. Obviously if you want to /communicate/ between threads, 
that's another matter. But you don't ever have to worry about making one 
thread wait to update something until after the other thread has 
definitely finished using it. You won't ever see bugs where you mutate 
something and then discover that another thread was actually still 
reading it.

Second, backtracking search problems. You might think that copying stuff 
every time you want to change it is inefficient, but if you're doing 
some kind of search problem where you want to search multiple avenues, 
then still having the old copy around drastically simplifies the 
algorithm, and probably /increases/ efficiency. Because the alternative 
is to deduce (or remember) what you did, and then manually undo it if 
you need to backtrack.

Some guy at Google wrote a paper about translating a Python (?) system 
to Haskell, and actually devoted a whole section to the problems of 
trying to enforce immutability in Python, and the difficulties of doing 
without it. Immutable data was sited as one of Haskell's biggest advantages.

The one place where immutability /does/ bit is arrays. Almost all 
Haskell data structures are linked lists or trees of some kind, because 
that way the copying is cheap. But if you update one index of a 
one-million index array, you've got to copy a contiguous block of a 
million elements. Not fast. Immutable arrays are great as super-fast 
lookup tables, but unless the data changes very infrequently, you end up 
needing mutable arrays. Immutable arrays just aren't all that useful.

[Insert rant about how Haskell's array implementation sucks...]

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 8 Sep 2011 00:09:42
Message: <4e684006$1@news.povray.org>

On 9/7/2011 2:22, Invisible wrote:
 > then O(log N) new heap nodes get constructed.

Which, incidentally, is the same number of nodes you'd have to update in a 
mutable heap, too.

> But you don't ever have to worry about making one thread
> wait to update something until after the other thread has definitely
> finished using it.

That's not strictly true. It's entirely possible in modern processors that 
the pointer gets written and is visible to other processors before the data 
it points to gets flushed from the cache into main memory.

> Immutable arrays are great as super-fast lookup tables, but unless the data
> changes very infrequently, you end up needing mutable arrays. Immutable
> arrays just aren't all that useful.

Some of the google stuff is cool in that way. You can imagine you don't want 
to update the entire file that holds the entire web every time you scan a 
new document.

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Invisible
Subject: Re: My hypothesis
Date: 8 Sep 2011 04:13:41
Message: <4e687935$1@news.povray.org>

On 08/09/2011 05:09 AM, Darren New wrote:
> On 9/7/2011 2:22, Invisible wrote:
>  > then O(log N) new heap nodes get constructed.
>
> Which, incidentally, is the same number of nodes you'd have to update in
> a mutable heap, too.

Indeed. The only difference is that with Haskell, you *copy* these nodes 
rather than simply modify them in-place.

>> But you don't ever have to worry about making one thread
>> wait to update something until after the other thread has definitely
>> finished using it.
>
> That's not strictly true. It's entirely possible in modern processors
> that the pointer gets written and is visible to other processors before
> the data it points to gets flushed from the cache into main memory.

No, that's exactly what the "cache coherence" protocol prevents.

>> Immutable arrays are great as super-fast lookup tables, but unless the
>> data
>> changes very infrequently, you end up needing mutable arrays. Immutable
>> arrays just aren't all that useful.
>
> Some of the google stuff is cool in that way. You can imagine you don't
> want to update the entire file that holds the entire web every time you
> scan a new document.

If defies my comprehension that what Google does is actually physically 
possible, but anyway...

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 8 Sep 2011 13:24:29
Message: <4e68fa4d@news.povray.org>

On 9/8/2011 1:14, Invisible wrote:
> No, that's exactly what the "cache coherence" protocol prevents.

Sure. I'm just saying that saying "it's write-once" doesn't automatically 
mean it's thread safe in a modern multi-processor. The generated code 
actually has to take some care to tell the processor to flush cache lines 
and such.

> If defies my comprehension that what Google does is actually physically
> possible, but anyway...

It kind of boggles my mind too. Just the complexity of the monitoring stuff 
is stunning, let alone actually getting work done. :-)

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Orchid XP v8
Subject: Re: My hypothesis
Date: 8 Sep 2011 14:37:01
Message: <4e690b4d$1@news.povray.org>

On 08/09/2011 06:24 PM, Darren New wrote:
> On 9/8/2011 1:14, Invisible wrote:
>> No, that's exactly what the "cache coherence" protocol prevents.
>
> Sure. I'm just saying that saying "it's write-once" doesn't
> automatically mean it's thread safe in a modern multi-processor. The
> generated code actually has to take some care to tell the processor to
> flush cache lines and such.

According to the documentation I read, this is automatic. As in, the 
code doesn't have to take any explicit action at all to ensure that all 
processors see a consistent image of main memory. It's just that if 
several cores try to access the same area at once, it gets very, very slow.

>> If defies my comprehension that what Google does is actually physically
>> possible, but anyway...
>
> It kind of boggles my mind too. Just the complexity of the monitoring
> stuff is stunning, let alone actually getting work done. :-)

Apparently it's one of the monitoring and load-balancing systems they 
translated to Haskell...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 9 Sep 2011 12:14:22
Message: <4e6a3b5e$1@news.povray.org>

On 9/8/2011 11:36, Orchid XP v8 wrote:
> According to the documentation I read, this is automatic.

It certainly depends on the CPU and architecture.

http://en.wikipedia.org/wiki/Memory_barrier

You also have to make sure you're not caching something in a register that 
you've written out the pointer to.

>> It kind of boggles my mind too. Just the complexity of the monitoring
>> stuff is stunning, let alone actually getting work done. :-)
>
> Apparently it's one of the monitoring and load-balancing systems they
> translated to Haskell...

What is?

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Orchid XP v8
Subject: Re: My hypothesis
Date: 9 Sep 2011 12:57:51
Message: <4e6a458f$1@news.povray.org>

On 09/09/2011 05:14 PM, Darren New wrote:
> On 9/8/2011 11:36, Orchid XP v8 wrote:
>> According to the documentation I read, this is automatic.
>
> It certainly depends on the CPU and architecture.

Well, yes, I was referring to the only architecture in existence that 
people are likely to be writing desktop applications for. :-P

> You also have to make sure you're not caching something in a register
> that you've written out the pointer to.

This is certainly true. That's why C has "volatile", for example.

>>> It kind of boggles my mind too. Just the complexity of the monitoring
>>> stuff is stunning, let alone actually getting work done. :-)
>>
>> Apparently it's one of the monitoring and load-balancing systems they
>> translated to Haskell...
>
> What is?

I saw a paper a while back about how Google (I forget which center) 
translated part of the monitoring system from Python to Haskell, and 
documented what they did and didn't find beneficial about this.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 9 Sep 2011 14:10:00
Message: <4e6a5678@news.povray.org>

On 9/9/2011 9:57, Orchid XP v8 wrote:
> Well, yes, I was referring to the only architecture in existence that people
> are likely to be writing desktop applications for. :-P

I think even amongst Intel and AMD x64 and x86 architectures, this varies. 
I'm pretty sure it's not 100% automatic or there wouldn't be instructions 
for causing it to happen.

> I saw a paper a while back about how Google (I forget which center)
> translated part of the monitoring system from Python to Haskell, and
> documented what they did and didn't find beneficial about this.

Ah. Well, there's monitoring out the wazoo here, so I'm quite certain they 
didn't rewrite even a majority of it.

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Invisible
Subject: Re: My hypothesis
Date: 12 Sep 2011 04:13:01
Message: <4e6dbf0d@news.povray.org>

On 09/09/2011 07:09 PM, Darren New wrote:
> On 9/9/2011 9:57, Orchid XP v8 wrote:
>> Well, yes, I was referring to the only architecture in existence that
>> people are likely to be writing desktop applications for. :-P
>
> I think even amongst Intel and AMD x64 and x86 architectures, this
> varies.

It probably varies on things like the Atom and other embedded stuff.

> I'm pretty sure it's not 100% automatic or there wouldn't be
> instructions for causing it to happen.

There are instructions for loading stuff without caching it, in case you 
happen to know that you're only going to access it once. There are 
instructions for flushing the cache, if you're about to do a context 
switch or you're controlling memory-mapped hardware. But as far as I 
know, you don't have to do anything special to get correct 
multi-processor behaviour. (Think about it; if you did, every 
multi-threaded application ever written would suddenly break when run on 
a multi-chip or multi-core setup.)

>> I saw a paper a while back about how Google (I forget which center)
>> translated part of the monitoring system from Python to Haskell, and
>> documented what they did and didn't find beneficial about this.
>
> Ah. Well, there's monitoring out the wazoo here, so I'm quite certain
> they didn't rewrite even a majority of it.

Oh, I'm sure they only translated one system. And it was only Google 
Netherlands (or somewhere like that), not the entire company. (Although 
it's surprising that different branches would actually do things 
differently...)

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 12 Sep 2011 18:49:48
Message: <4e6e8c8c$1@news.povray.org>

On 9/12/2011 1:13, Invisible wrote:
> (Think about it; if you did, every multi-threaded application ever written
> would suddenly break when run on a multi-chip or multi-core setup.)

Not really. You have no more likelihood of breaking because of lack of 
memory barriers than you have of breaking because you've cached something in 
a register during a task switch.

The compiler just has to write memory barrier instructions out when you 
access a volatile variable. That's why the keyword is there.

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>