POV-Ray: Newsgroups: povray.off-topic: Ocaml: Ocaml

POV-Ray : Newsgroups : povray.off-topic : Ocaml : Ocaml		Server Time 4 Sep 2024 11:23:17 EDT (-0400)
From: Invisible
Date: 5 Feb 2010 09:00:05
Message: <4b6c2465$1@news.povray.org>
So today I started learning (or at least familiarising myself with) 
Ocaml (Objective CAML). But this isn't going to well; I find myself 
continually screaming at the computer screen "YOU'RE DOING IT WRONG!"

If you don't care about why Ocaml is wrong and why Haskell is better, 
you can stop reading now.



As best as I can determine, Ocaml is some sort of hybrid of Java and 
Haskell. It's "mostly functional" and has syntax resembling Haskell, but 
it also has object-oriented features.

Personally I prefer the 100% pure approach of Haskell. But hey, maybe 
sometimes it's better to be pragmentic and go with a hybrid approach. It 
could be interesting to at least see how that works out, right?

I get the impression Ocaml is from academia too (just like Haskell), but 
it's designed especially with a focus on performance. This alone 
probably explains why it's strict (like C, Java, VB, C#, Pascal...) 
rather than lazy (like Haskell). Making Ocaml strict rules out all sorts 
of possibilities, and gives lower performance in some cases, but it 
makes performance *predictable*. And that's probably the Big Reason for 
this design choice. [It also makes several implementation details much 
simpler.]



Problems become apparent almost immediately. I had a look at the Great 
Language Shootout, where I found Ocaml getting royally OWNED by both C 
and C++, mainly because both of these used all four CPU cores, while 
Ocaml apparently can't. Ocaml supports multiple light-weight threads, 
but because the garbage collector isn't thread-safe, all Ocaml threads 
run in a single OS thread - i.e., on a single CPU core.

Haskell, of course, runs your threads on several cores, like you'd 
expect. (Unless you specifically ask it not to because you're trying to 
interface to external C libraries that use thread-local storage.)

Of course, even Haskell currently has a parallel but not concurrent GC, 
and in particular to perform a GC cycle you have to wait for all Haskell 
threads to stop. (They can only be stopped at certain points.) This can 
cause large program slowdowns, where one thread is in a tight loop and 
won't stop, and all the other threads are paused waiting to start a GC 
cycle.

But hey, all of this isn't really to do with *Haskell*, but rather with 
*implementation*. Specifically, I'm talking about GHC. Other 
implementations do exist. [Most of them don't support SMP *at all* 
though.] You could write another implementation, or fix GHC, and the 
above drawbacks would go away. I have no idea how many Ocaml 
implementations there are, but it seems plausible you could also make it 
to real multithreading too.



Then I started to look at the syntax. It's *mostly* identical to 
Haskell, but with a few differences. Most of the differences are just 
/different/. Haskell uses one symbol, Ocaml uses a different one. No big 
deal. But a few of them are quite nausiating.

In Haskell, I can write

   foo = 5

In Ocaml, you must write

   let foo = 5

Small bit of superfluous chatter there. Still, I guess it makes the 
syntax more consistent. Slightly. But then we get to the next 
abomination: In Haskell, I can say

   foo = bar foo

but in Ocaml, you must say

   let rec foo = bar foo

Yes, that's correct, you have to manually tell the compiler that you're 
making a recursive definition. WTF? What, it isn't trivial enough for 
the compiler to detect this all by itself? (You might try to argue that 
it prevents you writing recursive definitions when you didn't mean to, 
but this is *functional programming*! Recursion is ubiquitous!)



Another puzzling thing is that Ocaml appears to use a layout rule 
similar to Haskell [not that the tutorial I'm following bothered to 
point this out], and yet it still requires explicit semicolons anyway. 
(??) And there's a set of fairly complex rules for when you must and 
must not write them. (?!?)



Now in Haskell - or in fact any vaguely modern programming language - if 
I want to add two numbers together, I say

   x + y

BASIC had this, Pascal had this, C had this, Java had this, Smalltalk 
had this, and Haskell certainly has this. But not Ocaml, apparently. 
Here you must use "+" for adding integers, "+." for adding reals, and 
"+/" for adding arbitrary-precision integers. (And presumably some other 
symbol for rational numbers, complex numbers, vectors...)

This seems pretty much inexcusible to me. I'm sure there's a technical 
reason for why they did it this way, but it seems like a very basic 
thing to get wrong. Some languages have overloading, some languages have 
classes, and some languages just have interfaces, but all of them seem 
to manage to avoid this. As far as I can tell, the above makes it pretty 
much impossible to write code that works on more than one number type.

In a similar vein, in Haskell "7" is a number. Any type of number. It 
can be anything. But in Ocaml, "7" is an integer, "7." is a real, and I 
forget what the syntax for an arbitrary-precision integer is.



In Haskell, you can say "Set Integer" or "Set String" or "Tree Integer" 
or whatever. Apparently in Ocaml you have to say this backwards: 
"integer tree" and so forth. I haven't yet seen any higher-kinded type 
constructors, but it'll be interesting to see how that works!

Haskell uses the convention that type NAMES must begin with a capital 
letter, while type VARIABLE must begin with a lowercase letter. Ocaml 
uses the arguably superior convention that type variables start with an 
apostrophy. (I would have used a different symbol, but it's a reasonably 
design choice. Somebody recently pointed out that some Unicode alphabets 
have no concept of "uppercase" and "lowercase".)

Which reminds me - in Haskell, "Char" is a 32-bit Unicode character. 
Ocaml makes the mistake of using an 8-bit ASCII code. (But apparently 
there are libraries to "work around the problem".)

Haskell has a thing called "tuples". A tuple is a fixed-length, 
fixed-type collection. For example, you might write a parser function 
that takes a string and returns (say) an integer and the remainder of 
the string. To do that, you return a 2-tuple containing an integer and a 
string.

The literal value of such a tuple can be written as

   (5, "rest of string")

and its type is written as

   (Integer, String)

Ocaml has the same syntax for writing tuples, but the type signature becomes

   integer * string

which seems a rather perverse choice. I can see what they're getting it 
(a tuple type is a product type, whereas an enum is a sum type), but 
that's still rather far-out.

(Haskell also uses "[Integer]" for a list of integers, and "[3,5,7]" for 
a value of this type. In general, the type signatures mirror the value 
syntax - possibly to the point of being confusing. Personally, I've 
never been that fond of the list and tuple syntax. Sure, lists and 
tuples are common, but would it really kill you to write "Tuple2 True 5" 
instead of "(True, 5)"? It would be clearer...)

Also, Haskell's "()" type becomes "unit". Which is the name Haskell 
programmers use to refer to it, but hey. It's not entirely clear to me 
whether Ocaml lets values of type "unit" actually exist - and if so, 
what their syntax is.



Haskell has algebraic data types. Every Haskell type is nominally an 
ADT. (Although in the case of "Int32", it's an enumeration type with 
constructors named "0", "1", "2"...) All types follow the same uniform 
structure.

Ocaml, on the other hand, has "records" which are like "{field1=5, 
field2=true, field3="hi"}", and "variants" which are like Haskell ADTs. 
And they're apparently not the same thing. You can see this when you 
define one:

   type recordX = {field1 : int, field2 : string, field3 : stuff}

   type variantY = Thing of int * string * stuff | OtherThing of char * char

So "records" use named fields, but have only a single (unamed) 
constructor, while "variants" allow multiple constructors, but use a 
wierd syntax for defining the fields - a syntax resembling tuple types, 
actually. WTF?

(This quite apart from the parameter of a parameterised variant being 
written *before* the variant name, not after it. I wonder if you can 
have multiple parameters on a variant? I haven't seen any examples 
showing this. I also haven't seem any parameterised records, so I'm not 
sure if that's supported. I'm also not sure how it knows from looking at 
a record which type you mean...)



Haskell makes the rather illogical choice of using "--" as the start 
marker for a comment. (Great. So I can't use that as a name then!) But 
Ocaml uses the even stranger choice of "(* ... *)". Which means that you 
can't write "(*)" to mean the multiplication function, you must write "( 
* )" [with spaces] to prevent it parsing as a comment.

[Then again, Haskell has a glitch with the unary minus function...]

Also, Haskell uses "++" for string (and list) concatenation. But Ocaml 
inexplicably uses "^" instead. (Haskell uses both "^" and "**" to mean 
exponent - but one is with an integer exponent and the other is with an 
arbitrary exponent. No, I can never remember which is which.)



Things don't get any better when you start looking at the build process.

In Haskell, if I want to compile a multi-module program, I say

   ghc --make MyThing

GHC will then automatically build the entire dependency tree, detect 
what libraries you need, check what (if anything) needs to be 
(re)compiled, compile it, and link everything to give you MyThing.exe. 
Hell, if a module has changed, it'll recompile it, but then it takes a 
hash of the interface file. If the module's implementation has changed 
but its implementation has not, the module will be recompiled, but 
anything depending on it will not.

It can even spit out a makefile if you wish, automatically generated 
directly from the source code. But usually, you just DON'T NEED make at 
all. GHC does it all for you.

In Ocaml, things are not so easy. Apparently YOU have to manually 
determine the correct order in which to compile things, and issue all 
the commands in the right order. (Or just use make.) You even have to 
compile and seperately link things - and tell the compiler whether it's 
supposed to be compiling or linking, and manually tell it what libraries 
to include. *sigh*

It gets even better though. If you write a source file and compile it, 
*everything* inside is public. If you don't want that, you have to 
*manually* copy and paste the names of the public stuff into an 
"interface file", and you apparently have to manually write in the 
correct type signatures. (So much for "Ocaml does automatic type 
inference".) You then have to insure that this file is compiled at the 
right moment - i.e., after the corresponding source file is compiled, 
but before anything depending on it is compiled.

Again, in Haskell you just write a list at the top of the source file 
saying what things should be public. The compiler does the rest. Even if 
you're manually running the compiler stages for some reason, the right 
thing happens. The object file and interface description are generated 
automatically in a single pass, and dependencies won't compile until 
these exist.

It is *claimed* that you can compile Ocaml into a library that can then 
be linked into programs written in arbitrary other languages (C or 
whatever). If so, that would certainly seem like an advantage. OTOH, 
Haskell can supposedly do this too - it's just that it'll be a bloody 
huge library! (Because it will contain a copy of the entire Haskell RTS.)

Actually, the latest version of GHC supports [on Linux only] dynamic 
linking now, so a Haskell library can be dynamically loadable, and the 
RTS can be dynamically loadable, and any other libraries you're using 
can be dynamically loadable. So if your C program uses 5 Haskell 
libraries, you still only have to link one (dynamic) copy of the RTS 
[which is fairly big].



It seems everywhere I look, Ocaml does the same things that other 
languages have done, but does them wrong. The bits that are right are 
identical to existing languages. The bits that are new are almost all wrong.
Post a reply to this message