|
 |
So today I started learning (or at least familiarising myself with)
Ocaml (Objective CAML). But this isn't going to well; I find myself
continually screaming at the computer screen "YOU'RE DOING IT WRONG!"
If you don't care about why Ocaml is wrong and why Haskell is better,
you can stop reading now.
As best as I can determine, Ocaml is some sort of hybrid of Java and
Haskell. It's "mostly functional" and has syntax resembling Haskell, but
it also has object-oriented features.
Personally I prefer the 100% pure approach of Haskell. But hey, maybe
sometimes it's better to be pragmentic and go with a hybrid approach. It
could be interesting to at least see how that works out, right?
I get the impression Ocaml is from academia too (just like Haskell), but
it's designed especially with a focus on performance. This alone
probably explains why it's strict (like C, Java, VB, C#, Pascal...)
rather than lazy (like Haskell). Making Ocaml strict rules out all sorts
of possibilities, and gives lower performance in some cases, but it
makes performance *predictable*. And that's probably the Big Reason for
this design choice. [It also makes several implementation details much
simpler.]
Problems become apparent almost immediately. I had a look at the Great
Language Shootout, where I found Ocaml getting royally OWNED by both C
and C++, mainly because both of these used all four CPU cores, while
Ocaml apparently can't. Ocaml supports multiple light-weight threads,
but because the garbage collector isn't thread-safe, all Ocaml threads
run in a single OS thread - i.e., on a single CPU core.
Haskell, of course, runs your threads on several cores, like you'd
expect. (Unless you specifically ask it not to because you're trying to
interface to external C libraries that use thread-local storage.)
Of course, even Haskell currently has a parallel but not concurrent GC,
and in particular to perform a GC cycle you have to wait for all Haskell
threads to stop. (They can only be stopped at certain points.) This can
cause large program slowdowns, where one thread is in a tight loop and
won't stop, and all the other threads are paused waiting to start a GC
cycle.
But hey, all of this isn't really to do with *Haskell*, but rather with
*implementation*. Specifically, I'm talking about GHC. Other
implementations do exist. [Most of them don't support SMP *at all*
though.] You could write another implementation, or fix GHC, and the
above drawbacks would go away. I have no idea how many Ocaml
implementations there are, but it seems plausible you could also make it
to real multithreading too.
Then I started to look at the syntax. It's *mostly* identical to
Haskell, but with a few differences. Most of the differences are just
/different/. Haskell uses one symbol, Ocaml uses a different one. No big
deal. But a few of them are quite nausiating.
In Haskell, I can write
foo = 5
In Ocaml, you must write
let foo = 5
Small bit of superfluous chatter there. Still, I guess it makes the
syntax more consistent. Slightly. But then we get to the next
abomination: In Haskell, I can say
foo = bar foo
but in Ocaml, you must say
let rec foo = bar foo
Yes, that's correct, you have to manually tell the compiler that you're
making a recursive definition. WTF? What, it isn't trivial enough for
the compiler to detect this all by itself? (You might try to argue that
it prevents you writing recursive definitions when you didn't mean to,
but this is *functional programming*! Recursion is ubiquitous!)
Another puzzling thing is that Ocaml appears to use a layout rule
similar to Haskell [not that the tutorial I'm following bothered to
point this out], and yet it still requires explicit semicolons anyway.
(??) And there's a set of fairly complex rules for when you must and
must not write them. (?!?)
Now in Haskell - or in fact any vaguely modern programming language - if
I want to add two numbers together, I say
x + y
BASIC had this, Pascal had this, C had this, Java had this, Smalltalk
had this, and Haskell certainly has this. But not Ocaml, apparently.
Here you must use "+" for adding integers, "+." for adding reals, and
"+/" for adding arbitrary-precision integers. (And presumably some other
symbol for rational numbers, complex numbers, vectors...)
This seems pretty much inexcusible to me. I'm sure there's a technical
reason for why they did it this way, but it seems like a very basic
thing to get wrong. Some languages have overloading, some languages have
classes, and some languages just have interfaces, but all of them seem
to manage to avoid this. As far as I can tell, the above makes it pretty
much impossible to write code that works on more than one number type.
In a similar vein, in Haskell "7" is a number. Any type of number. It
can be anything. But in Ocaml, "7" is an integer, "7." is a real, and I
forget what the syntax for an arbitrary-precision integer is.
In Haskell, you can say "Set Integer" or "Set String" or "Tree Integer"
or whatever. Apparently in Ocaml you have to say this backwards:
"integer tree" and so forth. I haven't yet seen any higher-kinded type
constructors, but it'll be interesting to see how that works!
Haskell uses the convention that type NAMES must begin with a capital
letter, while type VARIABLE must begin with a lowercase letter. Ocaml
uses the arguably superior convention that type variables start with an
apostrophy. (I would have used a different symbol, but it's a reasonably
design choice. Somebody recently pointed out that some Unicode alphabets
have no concept of "uppercase" and "lowercase".)
Which reminds me - in Haskell, "Char" is a 32-bit Unicode character.
Ocaml makes the mistake of using an 8-bit ASCII code. (But apparently
there are libraries to "work around the problem".)
Haskell has a thing called "tuples". A tuple is a fixed-length,
fixed-type collection. For example, you might write a parser function
that takes a string and returns (say) an integer and the remainder of
the string. To do that, you return a 2-tuple containing an integer and a
string.
The literal value of such a tuple can be written as
(5, "rest of string")
and its type is written as
(Integer, String)
Ocaml has the same syntax for writing tuples, but the type signature becomes
integer * string
which seems a rather perverse choice. I can see what they're getting it
(a tuple type is a product type, whereas an enum is a sum type), but
that's still rather far-out.
(Haskell also uses "[Integer]" for a list of integers, and "[3,5,7]" for
a value of this type. In general, the type signatures mirror the value
syntax - possibly to the point of being confusing. Personally, I've
never been that fond of the list and tuple syntax. Sure, lists and
tuples are common, but would it really kill you to write "Tuple2 True 5"
instead of "(True, 5)"? It would be clearer...)
Also, Haskell's "()" type becomes "unit". Which is the name Haskell
programmers use to refer to it, but hey. It's not entirely clear to me
whether Ocaml lets values of type "unit" actually exist - and if so,
what their syntax is.
Haskell has algebraic data types. Every Haskell type is nominally an
ADT. (Although in the case of "Int32", it's an enumeration type with
constructors named "0", "1", "2"...) All types follow the same uniform
structure.
Ocaml, on the other hand, has "records" which are like "{field1=5,
field2=true, field3="hi"}", and "variants" which are like Haskell ADTs.
And they're apparently not the same thing. You can see this when you
define one:
type recordX = {field1 : int, field2 : string, field3 : stuff}
type variantY = Thing of int * string * stuff | OtherThing of char * char
So "records" use named fields, but have only a single (unamed)
constructor, while "variants" allow multiple constructors, but use a
wierd syntax for defining the fields - a syntax resembling tuple types,
actually. WTF?
(This quite apart from the parameter of a parameterised variant being
written *before* the variant name, not after it. I wonder if you can
have multiple parameters on a variant? I haven't seen any examples
showing this. I also haven't seem any parameterised records, so I'm not
sure if that's supported. I'm also not sure how it knows from looking at
a record which type you mean...)
Haskell makes the rather illogical choice of using "--" as the start
marker for a comment. (Great. So I can't use that as a name then!) But
Ocaml uses the even stranger choice of "(* ... *)". Which means that you
can't write "(*)" to mean the multiplication function, you must write "(
* )" [with spaces] to prevent it parsing as a comment.
[Then again, Haskell has a glitch with the unary minus function...]
Also, Haskell uses "++" for string (and list) concatenation. But Ocaml
inexplicably uses "^" instead. (Haskell uses both "^" and "**" to mean
exponent - but one is with an integer exponent and the other is with an
arbitrary exponent. No, I can never remember which is which.)
Things don't get any better when you start looking at the build process.
In Haskell, if I want to compile a multi-module program, I say
ghc --make MyThing
GHC will then automatically build the entire dependency tree, detect
what libraries you need, check what (if anything) needs to be
(re)compiled, compile it, and link everything to give you MyThing.exe.
Hell, if a module has changed, it'll recompile it, but then it takes a
hash of the interface file. If the module's implementation has changed
but its implementation has not, the module will be recompiled, but
anything depending on it will not.
It can even spit out a makefile if you wish, automatically generated
directly from the source code. But usually, you just DON'T NEED make at
all. GHC does it all for you.
In Ocaml, things are not so easy. Apparently YOU have to manually
determine the correct order in which to compile things, and issue all
the commands in the right order. (Or just use make.) You even have to
compile and seperately link things - and tell the compiler whether it's
supposed to be compiling or linking, and manually tell it what libraries
to include. *sigh*
It gets even better though. If you write a source file and compile it,
*everything* inside is public. If you don't want that, you have to
*manually* copy and paste the names of the public stuff into an
"interface file", and you apparently have to manually write in the
correct type signatures. (So much for "Ocaml does automatic type
inference".) You then have to insure that this file is compiled at the
right moment - i.e., after the corresponding source file is compiled,
but before anything depending on it is compiled.
Again, in Haskell you just write a list at the top of the source file
saying what things should be public. The compiler does the rest. Even if
you're manually running the compiler stages for some reason, the right
thing happens. The object file and interface description are generated
automatically in a single pass, and dependencies won't compile until
these exist.
It is *claimed* that you can compile Ocaml into a library that can then
be linked into programs written in arbitrary other languages (C or
whatever). If so, that would certainly seem like an advantage. OTOH,
Haskell can supposedly do this too - it's just that it'll be a bloody
huge library! (Because it will contain a copy of the entire Haskell RTS.)
Actually, the latest version of GHC supports [on Linux only] dynamic
linking now, so a Haskell library can be dynamically loadable, and the
RTS can be dynamically loadable, and any other libraries you're using
can be dynamically loadable. So if your C program uses 5 Haskell
libraries, you still only have to link one (dynamic) copy of the RTS
[which is fairly big].
It seems everywhere I look, Ocaml does the same things that other
languages have done, but does them wrong. The bits that are right are
identical to existing languages. The bits that are new are almost all wrong.
Post a reply to this message
|
 |