|
|
OK, so Erlang is something of an interesting case. There are several
reasons for this:
- There are many functional programming languages. (Haskell, OCaml,
Clean, and for some reason people keep calling Lisp "functional" too.)
However, Erlang is *the only* one that could be considered "commercially
successful", as far as I can tell.
- Other languages attempt to be *concurrent*, but only Erlang is
*distributed*.
- The system is supposedly insanely reliable. People toss around "nine
9s up-time" as if this is some sort of experimentally verified *fact*.
- The language claims to do all sorts of wacky, far-out stuff like
trivial concurrency and distribution, hot-swapping running code,
detecting and correcting run-time errors, and so forth. I want to see
how it's done.
I had a look at Erlang.org, but it's difficult to find anything that
really explains the interesting parts of the system. I can find plenty
of "Hello World" programs, and "this is how you run stuff remotely", but
nothing about "this is why we did it this way", or "this is how it
works" or "this is how to use it to build non-trivial applications".
Suffice it to say, from what little I could discover, I didn't like what
I was seeing. Like most commercially successful languages, Erlang is
obtuse, complex, ugly and kludgy. Much like C, Java or anything else
wildly popular. It's abundantly clear that Erlang is about as
"functional" as Lisp (i.e., not at all). Still, people claim that
Ericsson's entire business is based on it, and lots of people are using
it, so it must have got *something* right.
Eventually I discovered a file entitled Armstrong_theses_2003.pdf So I
spent the last two days reading this 300-page tome in the hope of
enlightenment.
Predictably, the author spends most of that page count either repeating
himself, or telling me in minute detail about things which are of
utterly no interest to me. However, I did learn some interesting things.
In a way, Erlang is a bit like Haskell: Fundamentally, it does something
extremely inefficient. And yet it seems to work just fine.
In Haskell's case, it's not allowing you to modify things in-place
(except under very controlled conditions). In Erlang's case, it's
refusing to allow processes to share state. This immediately implies
that if you want to send data from place to place, you must copy it all.
It's rather like Haskell's restriction, only much worse.
Then again, if one process is operating on a completely different node,
you are *forced* to copy data from place to place anyway. It is
unavoidable. So in a way, forcing you to do it all the time just makes
it that little bit simpler to move from local to distributed coding.
One thing that rapidly becomes clear (and doesn't seem to be mentioned
anywhere else) is that for Erlang, processes are about more than just
concurrency or distribution. They are about fault isolation. A
fundamental design assumption of the Erlang system is that stuff /will/
break, and /when/ it does, we want to maximally limit the damage. Part
of the philosophy is that if one process dies, it shouldn't hurt anybody
else.
Whereas in some other system you might structure things as separate
processes for increased performance or because it's more logical to
program it that way, in Erlang you might do this for no other reason
than the fact that one task can sensibly continue if the other fails.
You might use processes purely for error-handling reasons.
Another thing that becomes painfully obvious is that writing your
program in Erlang does not somehow magically make it reliable. The
language doesn't implement any kind of automatic error recovery system,
as you might be mislead to believe. Instead, Erlang provides more or
less the same kinds of error handling that any other language provides -
catch/throw, etc. You can nest exception handlers, re-throw or modify
exceptions, generate synthetic exceptions, and so on. And, as usual, any
uncaught exceptions at the "top level" just make the process die.
The difference is what happens when a process dies. Rather than just
dying, this situation is *detected*, and you can *react* to this. You
can make it so that related processes get auto-killed as well
(presumably because they can't sensibly complete without the dead
process), or you can make it send a notification to some monitor
process. The *language* itself allows notifications to be sent, but
nothing more. The *libraries* allow you to do sophisticated error
recovery, but *you* have to implement this. It doesn't happen by magic,
as many seem to suggest.
Very importantly, the system correctly handles things other than
software exceptions. Detecting division by zero is one thing. Detecting
that the machine in Australia that you were talking to just got hit by a
small pyroclastic flow is another. Obviously, no documentation actually
describes how this works.
So how does Erlang actually work then? I mean, you can create processes.
OK, then what? Well, as best as I can determine, each process has a
"mailbox", and you can send asynchronous messages to any process that
you have the address of. (And these messages can contain process
addresses.) This works the same way both locally and remotely.
Sending is a non-blocking operation. Receiving is a blocking operation,
and it makes use of Haskell-style pattern matching (which seems like a
rather neat fit here). You can also time out waiting for a message -
something which would be really hard to implement yourself, and which is
probably extremely important in networking code.
Message sending has the curious property that message delivery is /not/
guaranteed, but message ordering /is/ guaranteed. Like, WTF? I'm not
even sure how it's possible to tell that the messages are in order
without being able to tell if you got all of them, but whatever. Seems
very, very random to me. The thesis claims that implementing delivery
checks yourself is easy, but implementing ordering checks is very hard.
So they put the hard thing into the language, and left the easy thing up
to you if you need it. Um, OK?
One of the very trippy things about Erlang is that I can be sitting on
my PC in England, and I can just tell some Solaris server in Brazil to
spawn a new process. As expected, no word on how the hell this actually
works. I would have *thought* this means that the necessary code gets
beamed over the wire, but some documentation seems to suggest that you
have to install it yourself. And yet, you can apparently send arbitrary
functions as messages, so...?
There's also no word on authentication. I mean, how does the machine on
Brazil know that I'm authorised to be executing arbitrary code on it? My
best guess is that if you're a telecoms giant, you have a private data
network that nobody else can physically access, and you use that to
control your equipment. In other words, you're authorised if you can
actually access the system in the first place.
It's a similar story when we come to code hot-swapping. Surprise,
surprise, it turns out that you can't just magically hot-swap code, just
because it's written in Erlang. No, you have to do *actual work* to make
this possible.
The language itself allows two (and only two) copies of the same code to
be loaded at once, an "old" version and a "new" version. And it provides
a way to switch from one to the other for a specific process. And that's
*all* it gives you. Everything else has to be done, by hand, by you.
That means that you have to design your application so that there's some
sort of signal that you can send it which will make it switch to the new
version. So that takes care of your code, but what about your data?
Well, presumably your new code might have new data layouts or other
invariants, so you probably want to run some conversion function to port
the running environment to your new code.
In short, _you_ have to figure out how your new code is different from
your old code, _you_ have to write the code that converts any data
structures or establishes any new invariants, _you_ have to check that
this new code actually works properly, and _you_ have to design your
application in the first place so that you can tell it to switch to the
new code, running the conversion routine in the process.
And if you want to downgrade back to the old version? _You_ have to do
all the work for that too. Really, the only help that Erlang is giving
you is the ability to easily change from one chunk of code to another.
You could probably do the self-same thing in Java, if you built your
application so that every class has a version number, and there's some
way to tell the application to load a set of new class files and execute
a predetermined method on one of them. (I suspect Java's inflexibly type
system would probably whine though...)
Erlang really gives you no assistance at all beyond the minimal level of
"you can have two modules with the same name". And it's limited to just
two, by the way. If you try to load a third version, anything running
the first version is unceremoniously killed. And there's no mention of
any way to detect whether old code is still running. (Perhaps there is,
but I didn't see it mentioned.)
There was some talk of a packaging system, which sounds quite
interesting, but obviously no details are described. It talks about
being able to group modules into applications, and group applications
into releases which can have complex dependencies, build procedures and
installation processes. But... no details. So maybe it does
configuration management for you, or maybe it does very little to assist
you. I couldn't say.
The language itself has a few interesting features. Of course, being
designed as something easily parsable by Prolog, the syntax is utterly
horrid, it uses a kludgy preprocessor rather than actually supporting
named constants, and so forth. It's also dynamically typed, which some
people are presumably going to argue is somehow "necessary" because of
what it does. I can't help thinking that a powerful type system would
have made it /so much easier/ to make sure you got everything straight.
Just reading the document, I saw endless examples of data structures
who's meaning is ambiguous due to the lack of types. For example, it is
apparently impossible to tell the difference between a process that died
because of an exception, and a process which merely sent you a message
that happens to be a tuple containing the word "EXIT". The advice for
this presumably being "don't do that".
It's slightly bizarre. Erlang looks for all the world like a crude
scripting language with no safety at all and abysmal run-time
performance. And yet, the language is designed for running network
switches, possibly the most demanding high-performance hard-realtime
system imaginable, and people claim it has nine 9s up-time. The only
container types are linked lists and tuples, and yet the standard
libraries somehow include cryptography and complex network protocols.
Very odd...
So anyway, in spite of all the obvious bad things, there /are/ a few
interesting bits. You get Haskell-style pattern matching, but with a
twist. In Haskell, patterns are required to be "linear". (Don't you just
love how that single word is used to mean a million different unrelated
things?)
In Haskell, a variable is not allowed to appear twice in the same
pattern. If you want to things to be equal, you must bind them to two
different variables and then use a guard to check their equality. But
Erlang has no such limitation. It /automatically/ desugars a non-linear
pattern into a linear one with a guard. So there!
Erlang uses the rather baffling convention that mere variables start
with an upper-case letter, while proper nouns such as function names or
data points start with a lower-case letter. This makes it surprisingly
hard to read Erlang code. (Haskell, not to mention English, obviously
uses the opposite convention.)
I already mentioned that message receipt is by pattern matching. This
seems like a rather powerful way to work, especially considering that a
message that fails all patterns stays queued in the mailbox. So you can
actually use patterns to decide what order you want to process incoming
messages in. That seems quite sophisticated. And I already mentioned how
you can add a time-out as well, which is obviously a very frequent
requirement for hard-realtime systems.
In Haskell, general practise is to make all functions "total". In
particular, all pattern matching should cover every case if possible. By
contrast, apparently the general practise with Erlang is to /not/ cover
cases which aren't expected to occur, and just let the thing throw an
exception on a pattern-match failure. The rationale apparently being
that the automatically-generated exception string is just as descriptive
as anything that you could write by hand yourself, and it makes the code
less cluttered.
Certainly some people have a tendency to make their functions try to
return a valid result no matter how silly the inputs are. This is almost
certainly not the way to write reliable software. Myself I've always
thought that "***exception: Data.Map.lookup: key not found" is far more
informative than "***exception: Data/Map.hs, lines 12312-12315:
non-exhaustive patterns in case expression". In the first case, it's
instantly obvious what the problem is, and that the bug is likely to be
in the caller. In the second case... uh...?
Still, either way, if a key being missing is something that you're
anticipating, you can and should handle the exception in the caller (and
it won't matter what the text of the exception actually is). And if it's
a high-availability system, if the exception /is/ uncaught, you have a
program bug, and you probably want to log huge volumes of data about it,
/and then/ restart the thing ASAP before somebody notices it went wrong!
I read the part about "behaviours" and face-palmed. The idea is that the
standard library contains skeletons for things like "a server", and you
supply the body code that makes it "a name server" or "a unique ID
server" or whatever. Get this:
- The body code has to go in its own special module.
- The module has to export a specific set of functions with specific
names and specific numbers and types of arguments returning a specific
data structure as the result. (If only Erlang had type signatures, eh?)
- The skeleton passes an opaque "state" object between these functions,
which the functions can update.
What does that sound like to you? Yes, congratulations, you've just
invented object-oriented programming. But without dynamic binding or
inheritance. (Or static typing, for that matter.) A "module which
exports a special set of functions" is basically a *class*, and this
"threading opaque state from function to function" is basically a
stateful object.
Of course, modules don't have inheritance. That means that you can't use
an abstract class to define what methods a class is supposed to have. It
means you can't write a half-implemented class that other concrete
classes can inherit from. But it also means that you don't have to worry
about the nightmare of multiple inheritance either.
And there's another little twist: /Objects/ have state. But /functions/
passing around a mostly /immutable/ state gives the system a somewhat
functional flavour. Indeed, in one of the example programs, a function
receives some data, starts processing it, but if a certain condition
happens, it /reverts to the old state/ rather than keeping the new one.
This is jaw-droppingly hard to do if you use real objects with real
mutable state. But in a semi-functional approach, it's quite trivial.
This whole "module with special functions" monstrosity is all the more
weird because Erlang also has
1. first-class function names
2. lambda functions
Um, like, WTF? Why does the code have to go in its own module? Why can't
I just pass you a tuple of function names, or even lambda functions,
that define the callbacks? Hello??
One really neat feature of the language, which I didn't look into *too*
deeply, is the binary operators. Given that IP datagrams and so forth
are complicated assemblages of binary data fields with different
alignments and so forth, packing and unpacking binary data is kind of
crucial in networking code. Accordingly, Erlang not only allows you to
write binary literals, but to /pattern match/ on them. Eat that!
I didn't look at it too closely, but it looks quite neat. You write an
integer (or variable) followed by the number of bits. If you're packing,
the requested data gets put into the appropriate places. If you're
pattern matching, an integer obviously matches against that integer,
while a variable causes that many bits to be copied into the variable.
(I'm fuzzy on whether it becomes an integer or stays a binary...)
Not sure what happens if you need to write bytes backwards, as some
broken protocols and file formats require. Is there a directive for
that? Or do you have to do it by hand?
Many things about Erlang still remain unexplained. I still have
absolutely no clue WTF an "atom" is. Nor have I discovered what this
"OTP" thing that gets mentioned every 3 sentences is. I have utterly no
idea how the "registered process names" thing works. Erlang is supposed
to be for very high-performance systems, and yet to send any data from
one process to another you have to /copy/ it. It seems like that can't
possibly scale, but somehow it does. Apparently there's a distributed
database system written in Erlang, but I didn't find any mention of
that. And so on...
Still, I only have 1 PC, so what do I need distributed programming for? ;-)
Post a reply to this message
|
|