POV-Ray : Newsgroups : povray.off-topic : The mysteries of Erlang : Re: The mysteries of Erlang Server Time
29 Jul 2024 22:23:58 EDT (-0400)
  Re: The mysteries of Erlang  
From: Invisible
Date: 10 Mar 2011 05:37:20
Message: <4d78a9e0@news.povray.org>
On 09/03/2011 06:22 PM, Darren New wrote:
> Having worked with Erlang, here's some comments.

In what capacity are you "working with Erlang"? As I understand it, the 
main selling points of the language are reliability and distribution. 
While I doubt anyone actually *wants* unreliable software, there can't 
be too many problems that require distributed programming.

> Erlang isn't functional. It's single-assignment. The "functional" bit in
> Erlang doesn't give you benefits like it does in Haskell because there's
> still a whole bunch of non-functional operations. However, since you can
> only assign to each variable once, it means you can't have loops and you
> can't do a sequence of operations without making up new meaningless
> names for the data at each step.

Erlang is partially functional, in that functions are apparently 
first-class, and it's got single-assignment and immutable data 
structures. On the other hand, it doesn't enforce referential 
transparency; you can put side-effects anywhere you like. The document I 
read claims that it's considered beneficial to limit side-effects to 
just a few places in the codebase, and write most of the code in a 
"pure" style - which is exactly the same as for Haskell. It's just that 
the language itself doesn't /enforce/ any kind of strong separation 
between pure and impure code.

> If it was *actually* functional, you could put your own functions in
> guards, for example.

Wait - you *can't* do this already?? o_O

Why on earth would you design it this way?

>> - The system is supposedly insanely reliable. People toss around "nine
>> 9s up-time" as if this is some sort of experimentally verified *fact*.
>
> It is. They've been running a commercial switching network with it for
> something like 10 years, and one of the six data centers lost power or
> something and all the machines in it for 20 minutes once, so they had a
> 15% downtime for 20 minutes out of ten years.

One data point is scientific fact?

> It's pretty straightforward. The rest is in libraries.

Yeah, as I quickly found out.

> The collection of
> libraries that make it easy is called OTP. The documentation on OTP is
> either non-existent or blows mooses, depending on what you can find.

Just like Haskell, then. :-}

>> refusing to allow processes to share state. This immediately implies
>> that if you want to send data from place to place, you must copy it all.
>
> No, not true. Since the data is immutable, you don't actually need to
> copy it if sender and receiver are on the same processor.

OK, fair enough.

(Presumably both processes have to be running within the same logical 
"node", but then again also presumably you wouldn't have more than one 
node on the same box except for test purposes anyway...)

>> One thing that rapidly becomes clear (and doesn't seem to be mentioned
>> anywhere else) is that for Erlang, processes are about more than just
>> concurrency or distribution. They are about fault isolation.
>
> Indeed, that was the fundamental goal. Not distributed computing, but
> reliable computing, however that may be achieved.

As I say, this doesn't really seem to be spelled out anywhere else that 
I could find, and it seems a rather crucial detail.

>> less the same kinds of error handling that any other language provides
>> - catch/throw, etc.
>
> Huh. I didn't even know it provided catch/throw. I've never seen that
> used, actually.

Yeah they do. It's in there. Apparently the catch functions turn an 
exception into a tuple begining with 'EXIT' and followed by some stuff. 
Using throw you can make catch return an arbitrary data structure. 
(Quite how you tell the difference between an exception and a normal 
result is left to the imagination... It looks like you can use throw to 
make it look like the catch block exited normally.)

> OTP is the libraries that make it seem like magic. However, being
> typical undocumented libraries that you learn by spending 5 years as an
> employee of Erricson, written in a language so modular you can't even
> figure out what pieces of code are part of the program and what aren't
> let alone read the code to see what it does, I've never quite figured it
> out beyond the very surface level.

I'm glad it's not just me...

> There is documentation.
>
> Each hardware box is running one interpreter, each of which is running
> one thread per CPU. Each interpreter keeps a TCP socket open to all the
> other interpreters it knows about, as well as sending periodic "are you
> there" messages. That TCP socket is also used to send application-level
> messages to remote processes. If the socket breaks, anything on the
> local system linked to that socket gets a notification that it broke.
>
> There's also a separate process running on each local machine that's
> monitoring the local Erlang interpreter that will reboot the machine if
> the local Erlang interpreter stops working. (For some definition of
> "working" which isn't clear from the docs, but clearly includes
> answering keep-alive probes.)

Right. So it just sends heartbeat messages? (I presume the bit about 
rebooting the machine is optional; you wouldn't want your PC to reboot 
just because Wings 3D crashed, for example.)

>> Message sending has the curious property that message delivery is
>> /not/ guaranteed, but message ordering /is/ guaranteed. Like, WTF?
>
> Message delivery isn't guaranteed because it's asynchronous, and the
> remote machine may crash between the time you send the message and the
> time the remote machine receives it. Ordering *is* guaranteed because if
> the remote machine crashes before it gets message #5, it won't get
> message #6 either.

You mean that if message #5 doesn't arrive, message #6 is actually 
guaranteed not to arrive?

>> very, very random to me. The thesis claims that implementing delivery
>> checks yourself is easy, but implementing ordering checks is very
>> hard. So they put the hard thing into the language, and left the easy
>> thing up to you if you need it. Um, OK?
>
> Yep. Knowing if something is out of order is hard in a single-assignment
> language.

OK...

> Given it all runs over TCP, tho, I don't think that's actually a whole
> lot of trouble.

TCP guarantees both delivery and ordering. Unless the connection 
actually breaks. So is that all Erlang is saying? That it if one of the 
machines catches fire, Erlang can't guarantee that messages will be 
delivered to it?

These people are /serious/ about reliability. o_O

> Knowing the receiver got the message is as simple as adding code to send
> an ack each time you handle a message.

Network protocols 101: It's not that simple. It is possible for the 
message to arrive, but the acknowledgement itself to fail to be delivered.

Now, if you're running on TCP, so that ordering and delivery are 
guaranteed (unless there's no transport or no endpoint), the only way 
this can happen is if the transport or the endpoint dies after the 
message is received, which is probably just as bad as the message not 
being delivered...

>> One of the very trippy things about Erlang is that I can be sitting on
>> my PC in England, and I can just tell some Solaris server in Brazil to
>> spawn a new process. As expected, no word on how the hell this
>> actually works. I would have *thought* this means that the necessary
>> code gets beamed over the wire, but some documentation seems to
>> suggest that you have to install it yourself. And yet, you can
>> apparently send arbitrary functions as messages, so...?
>
> The functions are merely references to installed code. The code itself
> has to already be in Brazil, and the function you send is "this module,
> this name, this version".

I did think for a moment that you could send a function this way. Once 
you've compiled your module, every lambda function it contains must 
already exist in the compiled code somewhere. High-order functions make 
the program more complicated, of course. You can't necessarily point to 
just *one* function. It might be that you call several functions to 
construct the final function object. But even that is likely to be more 
compact than the actual executable code.

Thing is, I can open the REPL and type in a function, and then ask to 
spawn that on a remote node. How does *that* work?!

> There's a document out there somewhere that lists the binary formatting
> of the contents of all the messages.

I'll bet. No doubt somebody has written libraries for other languages to 
speak the same wire protocol.

> Starting the process remotely works because the local interpreter has
> (or establishes) a TCP link to the remote machine. All the messages for
> all the processes on your PC go over the same link to the machine in
> Brazil. There isn't one connection per communicating process.

Given that Erlang supports creating billions of processes and there's 
only a few thousand possible TCP ports... yeah, I figured that's how it 
would work.

I notice that connecting one Erlang node to another causes the two nodes 
to connect to *all* the nodes that each endpoint knows about. I'm not 
actually sure why though. You would think that a fully interconnected 
mesh would quickly exhaust the supply of TCP ports too.

>> There's also no word on authentication. I mean, how does the machine
>> on Brazil know that I'm authorised to be executing arbitrary code on it?
>
> Nope, you're fucked.
>
>> In other words, you're authorised if you can
>> actually access the system in the first place.
>
> Yes, exactly.

Ouch.

>> It's a similar story when we come to code hot-swapping. Surprise,
>> surprise, it turns out that you can't just magically hot-swap code,
>> just because it's written in Erlang. No, you have to do *actual work*
>> to make this possible.
>
> Well, sure.

You say "sure", but the documentation sings about how reliable Erlang is 
and makes it sound like all this stuff just happens by magic, because 
Erlang is so wonderful. The truth is actually that Erlang just 
implements the minimal features required to let _you_ build complex, 
robust, distributed applications.

> The two-versions limit is because the code itself isn't GCed. They
> document that they could have GCed the code, but that would have added
> lots of overhead for something relatively rare.

Fair enough.

>> Really, the only help that Erlang is giving
>> you is the ability to easily change from one chunk of code to another.
>> You could probably do the self-same thing in Java, if you built your
>> application so that [...]
>
> You'd have to be able to pass the network connections around too, remember.

You'd have to be able to do a lot of things. It would require careful 
application design, but it doesn't look especially difficult. (To get 
the framework done, anyway. Making the application itself sounds really 
hard, in Java or Erlang.)

>> Erlang really gives you no assistance at all beyond the minimal level
>> of "you can have two modules with the same name". And it's limited to
>> just two, by the way. If you try to load a third version, anything
>> running the first version is unceremoniously killed. And there's no
>> mention of any way to detect whether old code is still running.
>> (Perhaps there is, but I didn't see it mentioned.)
>
> There is, but it's deep in the bowels of "system" stuff.
>
> The OTP system provides all the packaging and maintenance for the higher
> levels of management. Complaining that Erlang itself doesn't automate
> such stuff is like complaining that you need a Linux package manager
> because the file system only knows about files, not applications. :-)

I don't mind which level the assistance is implemented at, just so long 
as there *is* some. ;-)

>> There was some talk of a packaging system, which sounds quite
>> interesting, but obviously no details are described.
>
> It does a bunch of stuff, including giving you ways to roll things
> forward and back and such.

Yeah, that's what I figured. Handling this stuff manually for more than 
a few dozen nodes sounds like the exact opposite of "reliable".

> Realize that most of these "systems" you're talking about are actually
> implemented as Erlang code.

Hey, it worked for Smalltalk. ;-)

> So, for example, to compile code, you
> actually sit down at the REPL and invoke the compiler function.

I wish the Haskell compiler could do this...

> I agree about the syntax etc. It's quite possible a better type system
> would make it easier.

Escaped experiment, eh? ;-)

> You should also check out Hermes, which is essentially the same thing
> with an astoundingly strict and high-level type system.

Heh, there is never enough free time. ;-)

>> Just reading the document, I saw endless examples of data structures
>> who's meaning is ambiguous due to the lack of types. For example, it
>> is apparently impossible to tell the difference between a process that
>> died because of an exception, and a process which merely sent you a
>> message that happens to be a tuple containing the word "EXIT".
>
> Well, it's possible unless you asked "turn all crashes from processes
> I'm watching into EXIT messages."

The advice presumably being that workers should only receive "real" 
messages, and supervisors should only receive exceptions. Then there can 
be no ambiguity. (Until the supervisor's supervisor asks it to do 
something...)

Actually, that's a point. Each process has *one* mailbox. Not one 
mailbox for each process it's communicating with. Just *one*. So you 
have to manually figure out who each message came from - usually by 
having the sender put a reply address in manually.

>> The advice for this presumably being "don't do that".
>
> No, this is actually more like "We did that on purpose in case you want
> the supervisor to react as if you exited while you keep running."

To quote the document, "it's a feature, not a bug!"

>> It's slightly bizarre. Erlang looks for all the world like a crude
>> scripting language with no safety at all and abysmal run-time
>> performance. And yet, the language is designed for running network
>> switches, possibly the most demanding high-performance hard-realtime
>> system imaginable, and people claim it has nine 9s up-time. The only
>> container types are linked lists and tuples, and yet the standard
>> libraries somehow include cryptography and complex network protocols.
>> Very odd...
>
> It's possible to link C (or anything else) into Erlang. The GUI for
> Erlang launches a separate Tcl/TK process and talks over a socket to it
> to do the drawing, for example.

So you're saying that all the "interesting" stuff like the SSL 
implementation is actually just an off-the-shelf C library?

I never did understand what's so great about Tk. It looks horrid, it's 
really hard to use, and it's almost impossible to get the layout you want...

> In general, network switching isn't that hard real-time, because you
> usually have custom hardware to do the switching. A 5ESS (which can
> handle up to 800,000 phone lines) runs on two 6800s, one of which is a
> hot spare. But of course there's a large lump of hardware that's routing
> the individual bytes here and there, so the 6800 only actually gets
> involved when you connect or disconnect, basically.

Wait, a 6800? As in, the chip *before* the obsolete 68000?

Damn. If the actual switching is in hardware, that the heck do you need 
the CPU for at all?

> Erlang is the only language I've found whose syntax sucks worse than
> C++, yet it has way, way less built in than C++.

Erlang: 0wned.

> The primary problem, I think, is it started out as "let's investigate
> how we can write a million lines of code that doesn't crash." Of course,
> once you have that million lines of code and it's running reliably,
> you're not going to go back and fix anything as trivial as the syntax.

Yeah. It's an escaped experiment. And now it's too late. TOO LATE!!! >_<

Actually, one of the reasons I wanted to learn about Erlang was to see 
if I can design something that does the same thing, but isn't insane.

>> By contrast, apparently the general practise with Erlang is to /not/
>> cover cases which aren't expected to occur, and just let the thing
>> throw an exception on a pattern-match failure.
>
> Well, that's not quite true for message pattern matching. For matching a
> pattern against a value, yes. But if you don't read messages, you leave
> them in the input buffer, which then grows and grows until you take out
> the entire interpreter. Normally you'd find the place in your code where
> you *think* you handled all the messages, and you'd put in a "if I match
> anything else, crash out" branch to the pattern match.

Well, yeah, I meant for pattern matching against a value. Receiving 
messages as a bit special.

>> What does that sound like to you? Yes, congratulations, you've just
>> invented object-oriented programming.
>
> Yep. Except since it's actually a separate process, it's called an
> "actor", not an "object". :-)

That's possibly the most irritating thing about the gen_server example. 
It explains what each line of code is doing, but neglects to mention the 
fact that one half of the code is potentially running on another 
continent to the other - which, let's face is, is the entire point of 
making *remote* procedure calls. :-P

>> Of course, modules don't have inheritance.
>
> No. You'd instead write something that invokes functions in a module
> whose name you provided when you instantiated this module. There's
> actually even syntax for this.

Yeah. It's just a tiny bit more work, that's all.

>> This whole "module with special functions" monstrosity is all the more
>> weird because Erlang also has
>>
>> 1. first-class function names
>>
>> 2. lambda functions
>>
>> Um, like, WTF? Why does the code have to go in its own module? Why
>> can't I just pass you a tuple of function names, or even lambda
>> functions, that define the callbacks? Hello??
>
> Because then you can't update the module. One of the things OTP is
> providing for you is the whole infrastructure for updating your module.
> That's why you send the "state" back to OTP and why the code has to be
> in a separate module. When OTP gets the "update" message, it invokes the
> new module with the state.

OK. So that's why the state is explicitly threaded (quite apart from 
Erlang apparently having no other way to handle running state). That 
still doesn't explain why I can't put several related behaviours into a 
single module. Perhaps I *want* to replace them all at the same time.

>> Many things about Erlang still remain unexplained. I still have
>> absolutely no clue WTF an "atom" is.
>
> An atom is an interned string. Same as a "symbol" in LISP or Smalltalk.
> The trick is that in a message, an atom is an index into the atom table,
> so it's short regardless of how long the atom is.

I never really understood what an interned string is. (And hence what 
the difference between a Smalltalk string and a symbol is.)

So you're saying that the compiler replaces all occurrences of an atom 
with a pointer to a single instance of the string? (And hence, you can 
compare atoms for equity by simple pointer equity rather than string 
comparison.)

>> Nor have I discovered what this "OTP" thing that gets mentioned every
>> 3 sentences is.
>
> It's the libraries that take the basic operations Erlang supplies (like
> "load new code" or "spawn link") and turn it into things like
> installable packages.

Right, OK. So what does it mean when they say "the compiler is an OTP 
application"?

>> I have utterly no idea how the "registered process names" thing works.
>
> A process can say "My mailbox is the registered process on server ABC
> for XYZ". Another process can say "Give me XYZ@ABC" and get that
> mailbox. It works by shchlepping non-Erlang messages between the erlang
> processes, just like the keep-alives do.

OK, so, I can register a PID with a particular name. Is that name local 
to just that node? Does every node in the system know about it? How do 
you avoid registering a name that some unrelated application has already 
registered? If I ask for a particular name, how does it figure out which 
node that's on? If a node goes down, does the name get taken out of 
service? How do you handle the same service being available from several 
nodes? (For load-balancing or fault tolerance, perhaps.)

>> Erlang is supposed to be for very high-performance systems,
>
> Not really.

Hmm. It certainly seems to be.

>> Apparently there's a distributed database system written in Erlang,
>> but I didn't find any mention of
>
> Mnesia. They were originally going to call it Amnesia, but someone
> pointed out that naming a database system after an illness characterized
> by forgetting things is probably a bad idea.

The name still looks wonky to me. ;-)

> It's a cute little system, but it has its flaws due to being implemented
> on top of a single-assignment language and keeping everything in memory.

Hmm. Sounds... fun.

> It's basically a distributed transactional system on top of the built-in
> database tables whose TLA name escapes me at the moment.

Right.

> Note that all these things, OTP and Mnesia, along with stuff like the
> compiler, the debugger, the profiler, the package manager, etc etc etc,
> are all documented as "here's the functions you use to do it in Erlang."
> Almost none of these things are actually command-line tools.

Nothing wrong with that. The question is how much detail the 
documentation explains. ;-)


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.