|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
http://tinyurl.com/3q8tkxc
Haskell is great for web development because:
1. Writing
Iterator<Foo> it = list1.iterator();
ArrayList<Foo> list2 = new ArrayList<Foo>();
while (it.hasNext()) list2.add(bar(it.next()));
is way more work than writing
list2 = map bar list1
2. No null pointers. Tightly constrained side-effects. Static typing
powerful enough to statically detect many common bugs at compile-time.
Automated run-time testing tools such as QuickCheck.
3. Thread-safe by default. Concurrency and parallelism are almost
trivial to implement. [Of course, making it actually *go faster* is
never trivial...] Very low cost for using more threads. Technologies
like STM banish race conditions, deadlock, livelock and world poverty.
4. Really easy to link to C for performance-critical sections.
5. Several Haskell web servers / application frameworks with very
competitive performance. (Of course, only the favourable benchmarks are
shown. Obviously.)
http://tinyurl.com/64bfdmv
Simon P. J. enumerating the various Haskell technologies for multi-core
programming: sparks, threads, locks, transactions, parallel arrays...
http://tinyurl.com/678c9xo
Implementing Erlang's distributed processing model in Haskell. (This is
a work in progress. I doubt it's production-ready yet - or any time soon.)
Notable by its absence is hot code update. Some commentators have noted
that if you actually care about high availability, you'll be using
multiple hot-spares anyway, and so long as you don't take down all
systems at once, you can just take one offline, update it, put it back
online, and move on to the next one.
Erlang's basic idea is that you have threads running on multiple
hardware nodes, and they exchange messages. Well, that's not especially
hard. You just need to define a way to convert the data you want to
send/receive into raw binary for transit over the network.
Slightly more tricky is that if a remote process dies, you want to be
notified about it. That just requires some sort of background process
running on each node to generate the notifications. Likewise, Erlang
lets you spawn new threads on remote nodes; that just means you need to
send a message to the corresponding background thread on the target node
asking it to spawn a thread.
Ah, but there's a snag: How do you send functions over the network?
That's really the bit they've solved here. The basic idea is that when
you ask to send a function, what actually gets sent is a representation
of the function's free variables [which need to be sendable too, of
course], followed by the address of the machine code for that function.
All of which obviously fails spectacularly if sender and receiver aren't
running the same version of the same program. ;-)
Actually, sending the machine code address requires modifications to the
compiler (i.e., a way for Haskell code to look up what the hell its
address actually *is*). Currently they've implemented a kludge using
Template Haskell to look up the function name at compile-time and send
that as a text string instead. Also currently you have to do some
legwork yourself which will eventually become automatic (e.g., working
out which functions need to be sendable, making a list of them, and
telling the compiler to generate machine code for them).
Amusingly, using Amazon EC2 you can actually chuck a Haskell program
onto 100 nodes, run it, and see how fast it goes. (Obviously no normal
human actually has access to 100 physical machines.) The graph of
runtime verses number of nodes looks very much like 1/x to me...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 13/05/2011 12:32, Invisible wrote:
> http://tinyurl.com/678c9xo
>
> Implementing Erlang's distributed processing model in Haskell. (This is
> a work in progress. I doubt it's production-ready yet - or any time soon.)
> Erlang's basic idea is that you have threads running on multiple
> hardware nodes, and they exchange messages.
Of course, Erlang is dynamically typed. By convention, the messages
exchanged are tuples, starting with an atom which identifies the type of
message.
The Haskell implementation does it differently, with two (incompatible)
methods of exchanging messages:
1. Every process implicitly has a "mailbox" that can be used to send or
receive messages of arbitrary [sendable] type. The "send" function sends
a message; the "receive" message waits for a message of the correct type
to arrive, and then returns that. Any messages of the wrong type stay
queued FIFO until an appropriate receive call is executed.
2. You can explicitly create typed "channels". Each such channel can
carry only one type of data, guaranteeing that the receiving end is able
to handle any type of message that you can send. There is also a
function for reading from multiple channels at once, possibly of
different types.
To me, the whole API looks a bit ad hoc and could do with tidying up a
bit. But the fundamental stuff seems to be in place...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 13/05/2011 12:32, Invisible wrote:
> http://tinyurl.com/64bfdmv
>
> Simon P. J. enumerating the various Haskell technologies for multi-core
> programming: sparks, threads, locks, transactions, parallel arrays...
Darren was asking me about this a while back. Apparently Haskell's
current STM guarantees the following:
- All committed transactions see a consistent image of the system.
- Deadlock is impossible.
- Livelock is impossible.
That seems to cover just about everything, but in fact there *is* one
bad thing that can still happen: starvation.
It is impossible for the system to get stuck in a condition where no
progress is made. (That would be either deadlock or livelock, which are
guaranteed not to happen.) However, it is perfectly possible for one
particular transaction to get stuck. There are two ways this can happen:
1. If a transaction waits for a condition which will never be fulfilled,
it can never complete. (Well, duh!)
2. It is possible for a long-running transaction to be eternally in
conflict with an endless stream of short transactions, such that the
short transactions complete successfully causing the long transaction to
repeatedly restart.
(This is not livelock, since the system is still making progress: All
the short transactions complete just fine. Only the long transaction
gets stuck.)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 5/13/2011 4:32, Invisible wrote:
> 4. Really easy to link to C for performance-critical sections.
Do you have an example of what you need to do in Haskell to link to a simple
C function? Is it easy for C to call Haskell functions?
> Implementing Erlang's distributed processing model in Haskell. (This is a
> work in progress. I doubt it's production-ready yet - or any time soon.)
Yah. Python has one of these too.
> Notable by its absence is hot code update. Some commentators have noted that
> if you actually care about high availability, you'll be using multiple
> hot-spares anyway, and so long as you don't take down all systems at once,
> you can just take one offline, update it, put it back online, and move on to
> the next one.
The problem with this is it takes *incredibly* longer. You have to write all
the current data out to disk, shut down cleanly (including all the processes
running there whose code you are *not* updating), then modify the code, then
start everything up again, and read everything back in off disk.
With Erlang's hot update, the data stays in memory. If you're fixing a bug,
you don't have to save out and then reload dozens of gigabytes of in-memory
database just to change the line. Plus, you now have to write all the code
to shut down cleanly and start up cleanly again, and track which tasks are
running in the same process as you're shutting down, shut them down too, etc.
Yeah, you can do it, and that's the answer I give to people who bitch about
Windows needing to reboot to update widely-used code and such, but it really
sucks for an availability system where you have bunches of threads on the
same machine.
> All of which obviously fails spectacularly if sender and receiver aren't
> running the same version of the same program. ;-)
It also fails if the sender and receiver aren't using the same version of
the data structures. As soon as you have half your code updated to be using
floats instead of integers, now you have to have some way in your *old* code
to be able to know you're getting *new* messages and how to handle them.
That can be rough with static typing.
> 1. Every process implicitly has a "mailbox" that can be used to send or
That's the Erlang approach, except of course it's dynamically typed.
> 2. You can explicitly create typed "channels". Each such channel can
That's the NIL / Hermes approach, both of which were statically typed. The
advantage of this is channels can become first class. You can pass them
between processes and have different channels connected to different
processes. You can use it for security, too - if I don't give you a channel
to the screen but only to your window, you can only write to your window,
and it's trivial to verify that.
> (This is not livelock,
I'm not sure about that. I think livelock is defined for individual tasks,
such that no task spends an unbounded amount of time waiting for access to
the critical section. I.e., a system making progress is defined as no part
of the system failing to make progress. But I can't find anything online to
back that up.
--
Darren New, San Diego CA, USA (PST)
"Coding without comments is like
driving without turn signals."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 13/05/2011 05:43 PM, Darren New wrote:
> On 5/13/2011 4:32, Invisible wrote:
>> 4. Really easy to link to C for performance-critical sections.
>
> Do you have an example of what you need to do in Haskell to link to a
> simple C function? Is it easy for C to call Haskell functions?
I'm fairly sure I posted one in the other group a while back. Heck, I
even got qsort() from stdlib.h to work arbitrary Haskell data types,
just because Warp said it wasn't possible.
Actually calling a C function is trivial. Passing a Haskell function
pointer to C is trivial. What is very *non*-trivial is getting data
marshalling and memory management to work.
>> Implementing Erlang's distributed processing model in Haskell. (This is a
>> work in progress. I doubt it's production-ready yet - or any time soon.)
>
> Yah. Python has one of these too.
I'm sure lots of people are doing this. After all, the basic
abstractions are not complex.
>> Notable by its absence is hot code update. Some commentators have
>> noted that
>> if you actually care about high availability, you'll be using multiple
>> hot-spares anyway, and so long as you don't take down all systems at
>> once,
>> you can just take one offline, update it, put it back online, and move
>> on to
>> the next one.
>
> The problem with this is it takes *incredibly* longer. You have to write
> all the current data out to disk, shut down cleanly (including all the
> processes running there whose code you are *not* updating), then modify
> the code, then start everything up again, and read everything back in
> off disk.
Perhaps. Perhaps the data is already in some distributed database, so
it's already persistent regardless of whether a node dies or not. (Isn't
the system supposed to recover from node death anyway?)
> With Erlang's hot update, the data stays in memory. If you're fixing a
> bug, you don't have to save out and then reload dozens of gigabytes of
> in-memory database just to change the line. Plus, you now have to write
> all the code to shut down cleanly and start up cleanly again, and track
> which tasks are running in the same process as you're shutting down,
> shut them down too, etc.
With hot update, you *still* have to manually write a bunch of code to
do any data conversions if data structures have changed, or do invariant
checks if the data's invariants have changed. Either way, you gotta
write code.
>> All of which obviously fails spectacularly if sender and receiver aren't
>> running the same version of the same program. ;-)
>
> It also fails if the sender and receiver aren't using the same version
> of the data structures.
Yeah. As I say, the current system is entirely predicated around
everybody having the same code. As soon as that's not true, it's game over.
Obviously that needs fixing for a non-toy implementation.
> As soon as you have half your code updated to be
> using floats instead of integers, now you have to have some way in your
> *old* code to be able to know you're getting *new* messages and how to
> handle them. That can be rough with static typing.
It's already dealing with receiving messages of arbitrary type even in
the presence of static typing. All that's needed is some way to compare
types beyond "do they have the same name?"
>> 1. Every process implicitly has a "mailbox" that can be used to send or
>
> That's the Erlang approach, except of course it's dynamically typed.
>
>> 2. You can explicitly create typed "channels". Each such channel can
>
> That's the NIL / Hermes approach, both of which were statically typed.
> The advantage of this is channels can become first class. You can pass
> them between processes and have different channels connected to
> different processes. You can use it for security, too - if I don't give
> you a channel to the screen but only to your window, you can only write
> to your window, and it's trivial to verify that.
Yeah, passing channels around looks quite useful. It just seems a pitty
to have more or less duplicated functionality in two incompatible
systems - mailboxes and channels. Couldn't they have implemented a
mailbox as a channel of arbitrary type or something?
>> (This is not livelock,
>
> I'm not sure about that.
Well, OK, let me rephrase: There is no situation where an STM program
gets so stuck that nothing can make progress. The worst that can happen
is that one particular transaction gets stuck. (And they're talking
about adding an API to detect / do something about this if it actually
happens.)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 5/13/2011 10:11, Orchid XP v8 wrote:
> I'm fairly sure I posted one in the other group a while back.
Yeah, I was just wondering if you had something convenient. I was curious
waht it looked like.
> Actually calling a C function is trivial. Passing a Haskell function pointer
> to C is trivial. What is very *non*-trivial is getting data marshalling and
> memory management to work.
Well, yeah, that's why I was curious. :-)
> I'm sure lots of people are doing this. After all, the basic abstractions
> are not complex.
They actually are pretty complex if you actually want it to be reliable.
Think this: You spawn a process, but before it starts up, it dies. Do you
handle it? Lots of edge cases to get right.
> Perhaps. Perhaps the data is already in some distributed database, so it's
> already persistent regardless of whether a node dies or not. (Isn't the
> system supposed to recover from node death anyway?)
Yes.
To give an example, mnesia (erlang's db) works by storing all the records in
a tree, with indexes being other trees. When you update a tree, it appends a
record to a data file saying what you changed - i.e., it logs the change. If
your node fails and you restart it, it replays the log to recreate the
database. Every once in a while, it takes the whole database and writes it
out to a new file, and starts a new log.
So restarting a node requires first reloading all the records, one at a
time, into a balanced tree. Then replaying all the changes since the last
time the node was saved out to disk. Then playing all the changes made since
the node went down, as copied from some other node.
On a gigabyte CSV file of a half dozen fields per line, on a 3GHz machine
with 100MBps data transfer, this takes about an hour for the first step of
just reloading the saved file. Say you have 50 machines to upgrade. You're
now talking 2 wall days just to restart the new processors, ignoring however
long it might take to dump the stuff out.
> With hot update, you *still* have to manually write a bunch of code to do
> any data conversions if data structures have changed, or do invariant checks
> if the data's invariants have changed. Either way, you gotta write code.
Sure. But you don't have to write code to cleanly shut down *other*
processes when some unrelated process on the same machine gets changed. If I
want to upgrade firefox, I don't have to write any code in the firefox
installer to save out my current Word document, close Word, update firefox,
reopen Word, and reload my Word document.
You can also convert things slowly, using the old data in the new code until
all the old code is gone, then slowly work thru the data (perhaps updating
it each time it's touched) to the new format.
> Yeah. As I say, the current system is entirely predicated around everybody
> having the same code. As soon as that's not true, it's game over.
That, I think, is the fundamental problem with systems like Haskell, and the
fundamental reason why dynamic typing may very well win in this situation.
> It's already dealing with receiving messages of arbitrary type even in the
> presence of static typing. All that's needed is some way to compare types
> beyond "do they have the same name?"
Oh, OK.
> Yeah, passing channels around looks quite useful. It just seems a pitty to
> have more or less duplicated functionality in two incompatible systems -
> mailboxes and channels. Couldn't they have implemented a mailbox as a
> channel of arbitrary type or something?
Two people designing the two systems, maybe?
> Well, OK, let me rephrase: There is no situation where an STM program gets
> so stuck that nothing can make progress. The worst that can happen is that
> one particular transaction gets stuck. (And they're talking about adding an
> API to detect / do something about this if it actually happens.)
Sure. Linux, way back when, would do the same thing with file read/write
locks. If you had four read transactions lasting 3 seconds, with a 1 second
gap, nd you had them all staggered at 1 second intervals, a write lock would
never take. A hold-over from the original UNIX design where kernel calls got
woken up just based on signaling some address they're waiting on and letting
the processes fight it out. That did get fixed a couple years later, tho.
The problem is that when it happens, it kills you. :-)
--
Darren New, San Diego CA, USA (PST)
"Coding without comments is like
driving without turn signals."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> 1. Writing
> Iterator<Foo> it = list1.iterator();
> ArrayList<Foo> list2 = new ArrayList<Foo>();
> while (it.hasNext()) list2.add(bar(it.next()));
> is way more work than writing
> list2 = map bar list1
Argument from verbosity is seldom a good argument in programming.
Just because something is shorter doesn't necessarily mean it's better.
Anyways, many languages have constructs to make that shorter. Curiously,
the next C++ standard will introduce constructs that allow you to write:
for(auto& v: list1) list2.push_back(v);
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 14/05/2011 08:53 AM, Warp wrote:
> Invisible<voi### [at] devnull> wrote:
>> 1. Writing
>
>> Iterator<Foo> it = list1.iterator();
>> ArrayList<Foo> list2 = new ArrayList<Foo>();
>> while (it.hasNext()) list2.add(bar(it.next()));
>
>> is way more work than writing
>
>> list2 = map bar list1
>
> Argument from verbosity is seldom a good argument in programming.
> Just because something is shorter doesn't necessarily mean it's better.
You don't think the intension of the second example is clearer at a glance?
> Anyways, many languages have constructs to make that shorter.
Some have a "foreach" construct that can make this kind of thing shorter.
Curiously, Smalltalk had a few library functions to do this kind of
thing. Then again, in Smalltalk, every block of code is an object. That
allowed Smalltalk to offer a #do: method for most container classes.
That means you can then simplement #collect:, #select: and #reject: in
terms of #do:. For example,
list2 := map collect: [ x | x bar ]
That's nearly the same level of directness as Haskell.
> Curiously,
> the next C++ standard will introduce constructs that allow you to write:
>
> for(auto& v: list1) list2.push_back(v);
And that does what? Copy the list?
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 <voi### [at] devnull> wrote:
> > Curiously,
> > the next C++ standard will introduce constructs that allow you to write:
> >
> > for(auto& v: list1) list2.push_back(v);
> And that does what? Copy the list?
I actually typoed the line. It misses the call to bar(). It should have
been:
for(auto& v: list1) list2.push_back(bar(v));
If I understand correctly, that's what the original code and the haskell
version are doing.
Naturally if you simply wanted to copy the list you would write a
simple "list2 = list1;" (or if you want to append instead of replace,
"list2.insert(list2.end(), list1.begin(), list1.end());", or if you want
to move the elements, "list2.splice(list2.end(), list1);".)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 13/05/2011 06:48 PM, Darren New wrote:
> On 5/13/2011 10:11, Orchid XP v8 wrote:
>> I'm fairly sure I posted one in the other group a while back.
>
> Yeah, I was just wondering if you had something convenient. I was
> curious waht it looked like.
Quoting myself from 3 years ago:
module Raw where
import System.Win32.Types
import Graphics.Win32.GDI.Types
import Graphics.Win32.Message
foreign import stdcall "windows.h PostMessageW"
postMessage :: HWND -> WindowMessage -> WPARAM -> LPARAM -> IO LRESULT
This creates a Haskell function named "postMessage". When you call this
function, it really calls PostMessageW() from the windows.h header. Note
carefully that *you* are responsible for getting the type signature right!
How this actually finds the necessary code to execute, I have no idea.
What I do know is that if you want to call your own C function (rather
than something from the Win32 API), you have to tell the compiler to
link the object code in at link time as well.
To call Haskell from C, you write a similar 1-liner to generate a C stub
function which knows the compiler-specific details for calling the
*real* function code.
>> Actually calling a C function is trivial. Passing a Haskell function
>> pointer
>> to C is trivial. What is very *non*-trivial is getting data
>> marshalling and
>> memory management to work.
>
> Well, yeah, that's why I was curious. :-)
You cannot access Haskell data from C at all. You can access C data from
Haskell if you're careful. For anything beyond primitive data (integers,
floats, etc), it's probably easier to write functions in the native
language to get/set the fields you're interested in.
Memory managemet is fun. (Principly because if you get it wrong... well,
you know what happens.) There are Haskell libraries for allocating and
freeing values that the GC doesn't move around (so-called "pinned
memory"). You can attach finalisers to things to have the GC call the
free routine for you when appropriate. (This of course takes only
Haskell into account; it cannot know whether C is still using the object.)
You can also arrange for the main() function to be implemented in C
rather than Haskell. (I believe you have to make sure main() calls some
Haskell startup functions though.) In this way, you can build a Haskell
program that calls C, or a C program that calls Haskell.
You can also build Haskell DLLs. I have no idea how that works...
Speaking of which, GHC now supports compiling each Haskell library as a
DLL, rather than linking them all statically. Unfortunately, this is
broken out-of-the-box. If you compile this way, you have to manually
find all the DLLs your program requires and move them into the search
path, because the GHC installer doesn't do this for you. Oops!
>> I'm sure lots of people are doing this. After all, the basic abstractions
>> are not complex.
>
> They actually are pretty complex if you actually want it to be reliable.
> Think this: You spawn a process, but before it starts up, it dies. Do
> you handle it? Lots of edge cases to get right.
The *abstraction* is simple. The *implementation* is not. The same is
true of many, many things. ;-)
>> Perhaps. Perhaps the data is already in some distributed database, so
>> it's
>> already persistent regardless of whether a node dies or not. (Isn't the
>> system supposed to recover from node death anyway?)
>
> So restarting a node requires first reloading all the records, one at a
> time, into a balanced tree. Then replaying all the changes since the
> last time the node was saved out to disk. Then playing all the changes
> made since the node went down, as copied from some other node.
>
> On a gigabyte CSV file of a half dozen fields per line, on a 3GHz
> machine with 100MBps data transfer, this takes about an hour for the
> first step of just reloading the saved file. Say you have 50 machines to
> upgrade. You're now talking 2 wall days just to restart the new
> processors, ignoring however long it might take to dump the stuff out.
Wouldn't converting one gigabyte of data to a new format online take
just as long as reloading it from disk?
>> With hot update, you *still* have to manually write a bunch of code to do
>> any data conversions if data structures have changed, or do invariant
>> checks
>> if the data's invariants have changed. Either way, you gotta write code.
>
> Sure. But you don't have to write code to cleanly shut down *other*
> processes when some unrelated process on the same machine gets changed.
I'm not seeing why you would need to do this with cold code update
either. You only need to shut down the processes related to the thing
you're actually changing.
> You can also convert things slowly, using the old data in the new code
> until all the old code is gone, then slowly work thru the data (perhaps
> updating it each time it's touched) to the new format.
This is an unavoidable requirement which *must* be met if you want
distributed processing. Otherwise upgrading the system requires stopping
the entire distributed network.
>> Yeah. As I say, the current system is entirely predicated around
>> everybody
>> having the same code. As soon as that's not true, it's game over.
>
> That, I think, is the fundamental problem with systems like Haskell, and
> the fundamental reason why dynamic typing may very well win in this
> situation.
No, this is the fundamental problem with a small proof-of-concept
implementation exploring whether it's even slightly possible to do this.
Obviously the current implementation does not yet provide everything
necessary for production use. Nobody is claiming it does.
>> It's already dealing with receiving messages of arbitrary type even in
>> the
>> presence of static typing. All that's needed is some way to compare types
>> beyond "do they have the same name?"
>
> Oh, OK.
Currently, every time you send a value, the implementation transmits the
name of the value's type, followed by the value itself. The receiving
end uses the type name to figure out which parser to deserialise with.
Similarly, functions are sent by name (along with their free variables).
This works great for testing that your implementation works. It breaks
worribly as soon as you have more than one version of your code. (Not to
mention two unrelated applications both of which happen to define
unrelated types having identical names...)
What is needed - and the documentation makes plain that the implementors
are quite aware of this - is some way of varifying that the identifiers
you send actually refer to the same thing. I don't actually know how
Erlang manages to do this; looks like the Haskell guys are planning to
use hashes of [some canonical representation of] the type definitions or
something...
>> Yeah, passing channels around looks quite useful. It just seems a
>> pitty to
>> have more or less duplicated functionality in two incompatible systems -
>> mailboxes and channels. Couldn't they have implemented a mailbox as a
>> channel of arbitrary type or something?
>
> Two people designing the two systems, maybe?
Looks more like they built the mailbox system, realised that there was a
guarantee it can't provide, and then engineered a second, independent
system which does.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|