POV-Ray: Newsgroups: povray.off-topic: My hypothesis

POV-Ray : Newsgroups : povray.off-topic : My hypothesis		Server Time 5 Jul 2025 23:01:18 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Darren New
Subject: Re: My hypothesis
Date: 6 Sep 2011 16:52:44
Message: <4e66881c$1@news.povray.org>

On 9/6/2011 13:28, Warp wrote:
>    I'm not sure how a compiler would be able to do that in a complex program
> where nodes and containers are being passed around.

I don't know about Haskell, but in some of the other languages (like 
Erlang), changes tend to be localized enough that you can do essentially a 
full-program analysis. It also helps that nothing can point to a newer 
object; that is, the old heap can never refer to the newer heap, because 
that would require modifying things that already exist.

That said, I don't really understand the details of how such an analysis is 
done myself.

>    (Of course the copying is not a big deal if each node contains eg. a
> couple of integers. However, it begins to matter if they contain for example
> large strings, which isn't all that unusual.)

Sure. Or if it's storing a big structure that for whatever reason the 
individual fields can't be shared, which would seem not unusual as well.

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Warp
Subject: Re: My hypothesis
Date: 6 Sep 2011 17:11:11
Message: <4e668c6e@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
> It also helps that nothing can point to a newer 
> object; that is, the old heap can never refer to the newer heap, because 
> that would require modifying things that already exist.

  That's true if the nodes to be modified are copied for the new heap.
However, the whole question was whether the compiler can elide this copying
for efficiency. If it mistakenly elides the copying of a node that is being
shared by some other part of the code, and makes the change in-place, that
will break the program (because that other part of the code will see the
change even though it shouldn't).

  What I'm wondering is that it's probably way more common for the nodes
to not being shared (and thus have only one single owner), but how can
the compiler know that for sure in a complex program?

  If such a data container is used only inside one single function and
its contents never passed anywhere else, the compiler may be able to
deduce that it can elide copying. However, in many (if not most) situations
such data containers will live for longer than one single function (after
all, that's the most common use of a data container) and used by several
different parts of the program. It's enough for two different parts of the
program to modify (or perhaps even just read) the same data container for
the in-place modifications becoming a program correctness hazard.

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 7 Sep 2011 00:15:57
Message: <4e66effd$1@news.povray.org>

On 9/6/2011 14:11, Warp wrote:
>    What I'm wondering is that it's probably way more common for the nodes
> to not being shared (and thus have only one single owner), but how can
> the compiler know that for sure in a complex program?

Right. Consider the "insert" function itself. It calls nothing else that 
mutates the heap except for "insert". So for the duration of the actual 
insert call, it knows the heap isn't shared.  Extend this concept a bit 
(yes, that's fuzzy, I know) and with sufficient analysis, you can figure out 
that when you do something like
    x = insert 27 original
    y = insert 56 x
    z = insert 23 y
that y is an intermediate value that can't be shared with anything else. 
(OK, maybe not in the heap case, but in a fair number of other such 
structures, it's not uncommon it's possible to determine that. Again, I'm 
not familiar with the details of how it's done.)

> and used by several different parts of the program.

If you compile those separately, yes, that can be problematic. If you 
compile it all at the same time, or have more sophisticated linking (like a 
VM, say), then the range over which the compiler can deduce a lack of 
sharing is greater.

Remember, too, that various kinds of GC can tell you whether something is 
shared or not. You could conceive of a copy-on-write sort of thing that only 
does the copy if the reference count is >1.

> It's enough for two different parts of the
> program to modify (or perhaps even just read) the same data container for
> the in-place modifications becoming a program correctness hazard.

True. That's why it's better to leave it to the compiler than the 
programmer. ;-)

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Invisible
Subject: Re: My hypothesis
Date: 7 Sep 2011 05:01:49
Message: <4e6732fd$1@news.povray.org>

>> Is this supposed to be efficient?
>
> If your functional language is pure enough, your compiler can often
> optimize out the apparent inefficiency. Of course, that requires a
> Sufficiently Smart Compiler.
>
> In practice, it's less efficient, but not as obviously awful as it would
> appear.

In a way, it's a bit like SQL. In SQL, if you want to find the customer 
name for each order, you say "take the Cartesian product of the order 
table and a customer table, then go through the result and throw away 
all rows except the ones where ord_cust = custID". If both tables are of 
size N, then the first step is O(N^2) in time and space, and the second 
step is O(N^2) in time and O(N) in space.

That's nauseatingly inefficient! But fortunately, every half-sane RDBMS 
on the planet looks at that query and goes "ah-HAH! You're doing a table 
join!" and replaces it with a vastly more efficient algorithm. (Be it a 
hash-join or an index lookup or whatever.)

If SQL wasn't being run by a "sufficiently smart interpreter", it would 
be too laughably inefficient to even consider using. The *only* reason 
its so ubiquitous is that everybody has built really, *really* smart 
interpreters. And everybody /relies on/ the query you actually write 
being optimised into something very fast. Usually it works every well 
indeed.

Of course, SQL isn't a general-purpose programming language. It solves 
one problem well. (Although it's not quite the relational algebra...)

Haskell is a general-purpose programming language. You write constructs 
which /look/ horrifyingly inefficient. And then the compiler does its 
thing, and the end result is more efficient than you'd imagine. Sure, 
probably not as fast as C, but fast enough.

Also: I love how this started out as "how easy is it to understand 
Haskell?" and has once again degenerated into "Haskell must be really 
slow". Which wasn't the question at all.

You know those C hackers? The ones who write horribly unmaintainable 
code in the name of being a few clock cycles faster? The ones who are so 
obsessed with using the minimum amount of CPU time and RAM that they 
write horrid spaghetti code?

Yeah, everybody hates those guys. You know what their problem is? They 
fail to realise that programs are judged on criteria *other* than 
run-time efficiency. Obviously nobody /wants/ slow code. But then nobody 
/wants/ buggy code, unmaintainable code, code that crashes, code that's 
impossible to understand, code that takes months to write and debug.

There's a /reason/ people code stuff in C and not assembly. Even though 
(in theory if not in practise) assembly is faster, people use C. As I 
recall, there was a long time where people wouldn't use C++ because they 
said it was "too slow", and that only C was "fast enough". (Which as 
amusing, given that the Language Shootout claims that C++ is actually 
marginally /faster/ than C...)

Haskell usually isn't as fast as C or C++. It's also a lot simpler to 
program. Especially if you want parallelism. You ask me "how do I update 
a heap without copying?" I ask you "how do you update a heap in a 
thread-safe manner?" Immutable by default => thread-safe by default. 
That's just one example of the kind of benefits that Haskell brings.

According to the Language Shootout, Haskell averages about 3x slower 
than C. Java averages only 1.5 slower (but uses more memory than 
Haskell). Then again, people actually write applications in Ruby, Python 
and so forth. People build high-traffic websites with PHP and Perl. All 
of these come in at *hundreds of times* slower than C. And it hasn't 
dented their popularity at all.

Sometimes run-time efficiency isn't the most important thing.

(One could argue that although Perl is Turing-complete, it probably 
shouldn't be regarded as a general-purpose language. It's probably 
blisteringly fast for the tiny fraction of the language that most people 
actually use. One could also argue that the Language Shootout is a 
benchmark to see which language has the most zealous fans with the most 
spare time...)

Post a reply to this message

From: Invisible
Subject: Re: My hypothesis
Date: 7 Sep 2011 05:22:28
Message: <4e6737d4$1@news.povray.org>

On 06/09/2011 09:28 PM, Warp wrote:

>    Hmm, for the compiler to be able to elide the copying and make the changes
> in-place, it would have to know for sure that those nodes that are being
> changed are not being referenced from anywhere else in the program. (Because,
> obviously, if they are referenced by something else, that something else is
> going to see the change although it shouldn't.)
>
>    I'm not sure how a compiler would be able to do that in a complex program
> where nodes and containers are being passed around.

I used to think that the compiler magically performed these kinds of 
optimisation somehow. I've gradually learned that this isn't the case.

Oh, sure, if you write one function that puts some data into a box, and 
another function that immediately takes the data out of that box again, 
the box will probably be optimised away. But for something like updating 
a list / tree / whatever, it's usually not feasible to optimise out the 
copying.

(There /are/ frameworks designed to spot long chains of container 
manipulations and inline them into a giant nested loop. But that's not 
compiler magic, that's clever user-level programming. The example I 
posted contains no such magic.)

>    (Of course the copying is not a big deal if each node contains eg. a
> couple of integers. However, it begins to matter if they contain for example
> large strings, which isn't all that unusual.)

Let us be clear about exactly what is copied:

If you have a heap which contains N internal nodes (since there is only 
"one" leaf node, shared by all heaps in the program) and you insert a 
new item into that heap, then O(log N) new heap nodes get constructed. 
Each [internal] heap node consists of 4 pointers. One points to 
something resembling a vtable [that's how you can tell Node and Leaf 
apart], one points to the datum, and the other two point to the child nodes.

So copying an internal node amounts to copying (and possibly altering) 4 
pointers. This /regardless/ of the type of data the heap holds. Even if 
it holds arbitrary-precision integers, you're still only copying 4 
pointers. The integers haven't changed, and so are not copied.

It does seem a pity that in the [very common] case of starting from an 
empty heap and repeatedly inserting new items into it, you have to keep 
continually generating these new nodes and then throwing them away. In 
mitigation, the GC is heavily optimised for this activity pattern. 
Allocations are cheap. They happen in a per-core allocation area (so no 
cache coherency issues), and each allocation is O(1). When the 
allocation area fills up, a GC sweep happens. Because the allocation 
area is quite small, it can be swept very quickly. The sweep identifies 
garbage, frees it, and defragments the free space (so that allocation is 
still O(1) in all cases).

It would be faster if you didn't have to do all this work, but it's not 
as slow as you'd think.

There are two key scenarios where immutable data becomes a Big Win:

First, there's parallel programming. Immutable data is thread-safe data. 
It /cannot/ change while you're examining it, so race conditions are a 
non-issue. Obviously if you want to /communicate/ between threads, 
that's another matter. But you don't ever have to worry about making one 
thread wait to update something until after the other thread has 
definitely finished using it. You won't ever see bugs where you mutate 
something and then discover that another thread was actually still 
reading it.

Second, backtracking search problems. You might think that copying stuff 
every time you want to change it is inefficient, but if you're doing 
some kind of search problem where you want to search multiple avenues, 
then still having the old copy around drastically simplifies the 
algorithm, and probably /increases/ efficiency. Because the alternative 
is to deduce (or remember) what you did, and then manually undo it if 
you need to backtrack.

Some guy at Google wrote a paper about translating a Python (?) system 
to Haskell, and actually devoted a whole section to the problems of 
trying to enforce immutability in Python, and the difficulties of doing 
without it. Immutable data was sited as one of Haskell's biggest advantages.

The one place where immutability /does/ bit is arrays. Almost all 
Haskell data structures are linked lists or trees of some kind, because 
that way the copying is cheap. But if you update one index of a 
one-million index array, you've got to copy a contiguous block of a 
million elements. Not fast. Immutable arrays are great as super-fast 
lookup tables, but unless the data changes very infrequently, you end up 
needing mutable arrays. Immutable arrays just aren't all that useful.

[Insert rant about how Haskell's array implementation sucks...]

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 8 Sep 2011 00:09:42
Message: <4e684006$1@news.povray.org>

On 9/7/2011 2:22, Invisible wrote:
 > then O(log N) new heap nodes get constructed.

Which, incidentally, is the same number of nodes you'd have to update in a 
mutable heap, too.

> But you don't ever have to worry about making one thread
> wait to update something until after the other thread has definitely
> finished using it.

That's not strictly true. It's entirely possible in modern processors that 
the pointer gets written and is visible to other processors before the data 
it points to gets flushed from the cache into main memory.

> Immutable arrays are great as super-fast lookup tables, but unless the data
> changes very infrequently, you end up needing mutable arrays. Immutable
> arrays just aren't all that useful.

Some of the google stuff is cool in that way. You can imagine you don't want 
to update the entire file that holds the entire web every time you scan a 
new document.

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Invisible
Subject: Re: My hypothesis
Date: 8 Sep 2011 04:13:41
Message: <4e687935$1@news.povray.org>

On 08/09/2011 05:09 AM, Darren New wrote:
> On 9/7/2011 2:22, Invisible wrote:
>  > then O(log N) new heap nodes get constructed.
>
> Which, incidentally, is the same number of nodes you'd have to update in
> a mutable heap, too.

Indeed. The only difference is that with Haskell, you *copy* these nodes 
rather than simply modify them in-place.

>> But you don't ever have to worry about making one thread
>> wait to update something until after the other thread has definitely
>> finished using it.
>
> That's not strictly true. It's entirely possible in modern processors
> that the pointer gets written and is visible to other processors before
> the data it points to gets flushed from the cache into main memory.

No, that's exactly what the "cache coherence" protocol prevents.

>> Immutable arrays are great as super-fast lookup tables, but unless the
>> data
>> changes very infrequently, you end up needing mutable arrays. Immutable
>> arrays just aren't all that useful.
>
> Some of the google stuff is cool in that way. You can imagine you don't
> want to update the entire file that holds the entire web every time you
> scan a new document.

If defies my comprehension that what Google does is actually physically 
possible, but anyway...

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 8 Sep 2011 13:24:29
Message: <4e68fa4d@news.povray.org>

On 9/8/2011 1:14, Invisible wrote:
> No, that's exactly what the "cache coherence" protocol prevents.

Sure. I'm just saying that saying "it's write-once" doesn't automatically 
mean it's thread safe in a modern multi-processor. The generated code 
actually has to take some care to tell the processor to flush cache lines 
and such.

> If defies my comprehension that what Google does is actually physically
> possible, but anyway...

It kind of boggles my mind too. Just the complexity of the monitoring stuff 
is stunning, let alone actually getting work done. :-)

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

From: Orchid XP v8
Subject: Re: My hypothesis
Date: 8 Sep 2011 14:37:01
Message: <4e690b4d$1@news.povray.org>

On 08/09/2011 06:24 PM, Darren New wrote:
> On 9/8/2011 1:14, Invisible wrote:
>> No, that's exactly what the "cache coherence" protocol prevents.
>
> Sure. I'm just saying that saying "it's write-once" doesn't
> automatically mean it's thread safe in a modern multi-processor. The
> generated code actually has to take some care to tell the processor to
> flush cache lines and such.

According to the documentation I read, this is automatic. As in, the 
code doesn't have to take any explicit action at all to ensure that all 
processors see a consistent image of main memory. It's just that if 
several cores try to access the same area at once, it gets very, very slow.

>> If defies my comprehension that what Google does is actually physically
>> possible, but anyway...
>
> It kind of boggles my mind too. Just the complexity of the monitoring
> stuff is stunning, let alone actually getting work done. :-)

Apparently it's one of the monitoring and load-balancing systems they 
translated to Haskell...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: My hypothesis
Date: 9 Sep 2011 12:14:22
Message: <4e6a3b5e$1@news.povray.org>

On 9/8/2011 11:36, Orchid XP v8 wrote:
> According to the documentation I read, this is automatic.

It certainly depends on the CPU and architecture.

http://en.wikipedia.org/wiki/Memory_barrier

You also have to make sure you're not caching something in a register that 
you've written out the pointer to.

>> It kind of boggles my mind too. Just the complexity of the monitoring
>> stuff is stunning, let alone actually getting work done. :-)
>
> Apparently it's one of the monitoring and load-balancing systems they
> translated to Haskell...

What is?

-- 
Darren New, San Diego CA, USA (PST)
   How come I never get only one kudo?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>