|
|
In the category of "nobody cares, but I'm going to write about it
anyway"... A few months back I started building a Haskell to JavaScript
"compiler". (I say "compiler" since the thing was far too incomplete to
be described as a *real* compiler; more like a mere compiler back-end.)
That has lain dormant for a while now, but I've just embarked upon
writing a Haskell to C# compiler.
The hardest part about compiling Haskell is its laziness. As you know,
in Haskell nothing actually gets executed until you *do* something with
the result. In a low-level language like machine code (or, more
realistically, C) this is quite hard to arrange. But you know what? Both
JavaScript and C# actually have first-class functions that can access
all currently in-scope local variables. *That's the hard part!* And
these languages have it built-in!
Obviously, JavaScript can run in a web browser. That's why I embarked on
that project in the first place. But it turns out JavaScript has another
extremely useful property: it's not object-oriented. Or rather, it
doesn't have classes. Objects can just dynamically change structure at
run-time. (Although --- depending on your JS engine --- that seems to
entail a performance hit. They tend to be optimised for the common case
where people code as if it *were* class-based, and JS engines optimise
for that common case.)
This turns out to be tremendously useful for Haskell; laziness is
implemented by storing some kind of data structure representing the work
to be done, and then overwriting it in-place with the result of that
work when the time comes. Normal OO languages don't let an object
suddenly change class like that, but JavaScript is fine with it.
So much for JavaScript. When I came to C#, I made an interesting
discovery: It is possible to encode the Haskell type system into the C#
type system. Well, almost. And it looks like it might even be relatively
simple for C# to call Haskell and vice versa. But I'm getting ahead of
myself...
So how do you compile Haskell? Well, first you need to understand what
Haskell can do.
Haskell has a powerful and elaborate type system. But actually, it turns
out that all the polymorphism is actually at compile-time (not unlike
C++ templates --- although the implementation is totally different).
Haskell has complete "type erasure" --- meaning that at run-time, the
system does not know or care what type anything is. (C and Pascal also
have this property, for example.) In summary, if we pretend that some
earlier stage of the compiler has already type-checked the program and
decided that it's legal, we can completely ignore Haskell types from now on.
So that leaves us with Haskell expressions. Now Haskell has a seemingly
vast amount of syntax to it, but actually the entire language "desugars"
down to just 6 fundamental constructs. In an expression, you can:
1. Declare new local variables.
2. Dereference any in-scope variable.
3. Declare an anonymous function.
4. Call a function with some arguments.
5. Construct a data structure.
6. Inspect and extract data from a data structure.
Some of these are self-explanatory. Some of them require a little
further explanation.
Suppose we have a Haskell declaration such as
data Point = Point Int Int
We can easily encode this in C# as
public sealed class Point
{
public readonly int X, Y;
public Point(int x, int y)
{
this.X = x;
this.Y = y;
}
}
Point #5 above means that in Haskell I can write
Point 5 7
to construct a Point value. The corresponding C# code is pretty obviously
new Point(5, 7)
So that's easy enough. Similarly, point #6 says that I can write
distance :: Point -> Int
distance (Point x y) = sqrt (x*x + y*y)
Again, an obvious C# analogue is
int distance(Point p)
{
return p.X*p.X + p.Y*p.Y;
}
To really understand 5 and 6, you need to understand Haskell's algebraic
data types (ADTs). [Note that ADT also stands for *abstract* data type,
which is something very different!] Consider the following declaration:
data BufferMode = Unbuffered | LineBuffered | BufferSize Int
Is it an enum? Is it a struct? No, it's a discriminated union! Roughly,
it says that if a variable is declared as type "BufferMode", then legal
values for that variable are "Unbuffered", "LineBuffered", or
"BufferSize" followed by an Int (e.g., "BufferSize 7").
We can encode that in C# in the following slightly strange manner:
public abstract class BufferMode
{...}
public sealed class Unbuffered : BufferMode
{...}
public sealed class LineBuffered : BufferMode
{...}
public sealed class BufferSize : BufferMode
{
public readonly int Field1;
...
}
In other words, we make an abstract class with nothing in it, and
concrete classes for each "shape" of value that Haskell permits. We then
declare every Haskell value of type BufferMode as a C# value of type
BufferMode.
So that covers #5, since we can do "new BufferSize(1024)" for example.
But #6 is where things get... interesting.
Most normal programming languages have an if/then/else statement. Some
of them also have a switch/case construct, which is syntax sugar for a
big, complicated if/then/elseif block.
Haskell has an if/then/else *expression* (which works like the C#
ternary operator). It also has a construct called a case-block. And
if/then/else is syntax sugar for a case-block. *Not* the other way around.
to_string :: BufferMode -> String
to_string mode =
case mode of
Unbuffered -> "No buffering"
LineBuffered -> "Buffer each line"
BufferSize n -> "Buffer every " ++ show n ++ " bytes"
Take note of the final line: Here we declare a new local variable "n"
and bind its value to the Int field of BufferSize. This variable is
*only* in-scope on the final line (since otherwise the Int field doesn't
exist).
We are actually doing 3 separate things here:
* If "mode" is an unevaluated expression, evaluate it *now*!
* Take a conditional branch depending on which sort of value mode is.
* Extract any data fields in mode and bind them to variables.
A case-block is the only Haskell construct that causes execution to
actually occur (i.e., it defeats laziness). It is the only Haskell
construct for explicitly performing conditional branching. And it is the
only Haskell construct that can extract fields from data structures.
[Actually, some of the above isn't 100% true. There are a number of
primitive operations that *also* provoke execution --- most
conspicuously, any kind of I/O. You can also do conditional execution
based on the condition "what function is in f?" by just calling f with
some arguments.]
As you can see, the case-block is a fundamental Haskell primitive. If we
can implement this, we're in good shape!
One possibility is to encode each Haskell function as a C# method on the
class hierarchy. In other words, add an abstract ToString() method to
BufferMode, with concrete implementations for each of the subclasses.
That doesn't work well, though. Haskell is "factored the other way"
compared to OO languages:
* OO languages give you an extensible set of data structures with a
fixed set of operations. [You can always add a new subclass, but you
can't really add new methods.]
* Haskell gives you a fixed set of data structures, with an extensible
set of operations. [You can always add new functions that operate on
BufferMode. You can never add a new kind of BufferMode.]
These differences make sense for different use-cases. If you're building
a widget library, of *course* you always want to be able to add new
kinds of widget! But the operations available on any arbitrary widget
are more or less fixed. If, on the other hand, you're building an
abstract syntax tree, the possible tree nodes are fixed by the language
you're trying to process; if that language changes, you need to redesign
your whole program *anyway*! But the set of operations you might want to
do on that syntax is always expanding...
Fortunately, another possible encoding in C# looks like this:
string ToString(BufferMode mode)
{
{
var temp = mode as Unbuffered;
if (temp != null)
{
return "No buffering";
}
}
{
var temp = mode as LineBuffered;
if (temp != null)
{
return "Buffer each line";
}
}
{
var temp = mode as BufferSize;
if (temp != null)
{
return "Buffer every " + temp.Field1 + " bytes";
}
}
throw new Exception("Unreachable code");
}
For those of you who don't speak C# fluently:
* The explicit case (Foo)bar attempts to convert the object pointed to
be bar into class Foo. If this fails, a run-time exception is thrown. We
could catch and ignore that, but exception handling is quite slow.
* The "as" operator in "bar as Foo" attempts to cast bar to type Foo,
but if it fails it just returns null (rather than throwing an exception).
* There is also an "is" operator. We could just do "if (mode is ...)".
But note the final case, where we actually want to *do* something with
the casted result.
* The strange extra curly brackets around each if allow us to reuse the
variable name "temp" with a different type signature each time. (Since
it's a local variable, it's only in-scope until the closing brace.)
We're doing great here so far. Oh, except for one thing: the mode might
be unevaluated. We need to make sure it gets evaluated as the first
step. So how do we store unevaluated data in C#?
Well, whichever way you slice it, it's probably going to require a
visible type. The Haskell term for something we haven't evaluated yet is
a "thunk". So we're going to have something like
public sealed class Thunk<T>
{
public T Execute() {...}
}
The idea here is that the first time we call Execute(), it executes the
enclosed action, and returns us the result. On subsequent calls to
Execute(), it just immediately returns the previously calculated result.
It's mildly tricky to make this thread-safe, but not that hard.
Basically, at the heart of a Thunk<T> has a C# Func<T> (which is a
pre-defined C# type). For example,
var t = new Thunk<int>( () => DoLongComputation(3, 7, 59) );
When we do t.Execute(), it will run DoLongComputation() with the
specified arguments. So that's easy enough.
Now we can see that our ToString() function needs to change a bit:
string ToString(Thunk<BufferMode> t_mode)
{
var mode = t_mode.Execute();
...
}
The rest of the code is as before. But wait! It turns out we need to
change a bit more. The definition for BufferSize should actually be
public sealed class BufferSize : BufferMode
{
public readonly Thunk<int> Field1;
...
}
Instead of the field being an int, it's actually a Thunk<int>. Because
that's how Haskell works: every data structure field is potentially
unevaluated.
Many people think of a datum as either "evaluated" or "unevaluated". But
consider the Haskell list type:
data List x = Node x (List x) (List x) | End
---
public abstract class List<T>
{
...
}
public sealed class Node<T> : List<T>
{
public readonly Thunk<T> Field1;
public readonly Thunk<List<T>> Field2;
...
}
public sealed class End<T> : List<T>
{
...
}
Now we can see that if I hand you a Thunk<List<T>>, the whole thing
could be unevaluated. Or the first node of the list could have Field1
evaluated and Field2 not evaluated. Or the other way around. You can see
now that a list can be in many, many stages of partial evaluation.
You may *also* have noticed how Haskell's polymorphic list type maps
beautifully to a C# generic type. List x simply becomes List<T>.
Returning to our ToString() function, we need to change the following:
{
var temp = mode as BufferSize;
if (temp != null)
{
return "Buffer every " + temp.Field1.Execute() + " bytes";
}
}
So we need to execute Field1 in case it wasn't already executed.
Indeed, even if it *was* executed, we still need to call Execute() to
fetch the result. Ideally called Execute() would take a Thunk<int>
object and overwrite it in-place with a boxed int value. But C# doesn't
let objects suddenly change class like that. (Obviously the native
Haskell implementation is written in C and has no such qualms.)
In a way, though, it's kind of illuminating to actually *see* where the
thunks are. For your entertainment, here is the Haskell map function:
map :: (x -> y) -> List x -> List y
map f xs =
case xs of
Node y ys -> Node (f y) (map f ys)
End -> End
--
List<T2> Map<T1, T2>(Func<Thunk<T1>, T2> f, Thunk<List<T1>> t_xs)
{
var xs = t_xs.Execute();
{
var temp = xs as End;
if (temp != null)
{
return new End<T2>(); // Notice different type parameter.
}
}
{
var temp = xs as Node;
if (temp != null)
{
var z = new Thunk<T2>( () => f(temp.Field1) );
var zs = new Thunk<List<T2>>( () => Map(f, temp.Field2) );
return new Node<T2>(z, zs);
}
}
}
Notice how we never actually call Execute() on Field1 or Field2. We pass
them along as potentially unevaluated thunks (indeed, enclosed in now,
bigger thunks). You can see from the original Haskell source code that
this function is recursive. However, encoded in C#, you can see that the
function never *directly* calls itself. It merely builds a new thunk for
the recursive call. If that thunk never hits Execute(), the recursive
call never happens.
And that, ladies and gentleman, is why you can Map() an infinite list
and have it return in finite time. Indeed, that's why you can construct
an infinite list in the first place!
Now consider the length function:
length :: List x -> Int
length xs =
case xs of
Node y ys -> 1 + length ys
End -> 0
---
int Length<T>(Thunk<List<T>> t_list)
{
var list = t_list.Execute();
{
var temp = list as End;
if (temp != null)
{
return 0;
}
}
{
var temp = list as Node;
if (temp != null)
{
return 1 + Length(temp.Field2);
}
}
}
Immediately we see a problem here. See that recursive call right at the
bottom? Yeah, Haskell has the so-called "tail-call optimisation". But
that final call is *not* a tail-call. *This* is a tail-call:
last :: List x -> x
last xs =
case xs of
End -> error "no last element"
Node y ys ->
case ys of
End -> y
Node _ _ -> last ys
Notice how in that last line, the *last* thing that happens is the
recursive call. By contrast, length doesn't call length. It calls 1 +
length. So "+" is the thing being tail-called, not "length".
Now if it's a tail-call, you can do this:
T Last<T>(Thunk<List<T>> t_xs)
{
var xs = t_xs.Execute();
{
var temp = xs as End;
if (temp != null)
{
throw new Exception("no last element");
}
}
var y = (xs as Node).Field1; // Cast cannot fail.
while (true) // Exit condition is further down.
{
{
var temp = xs as Node;
if (temp != null)
{
y = temp.Field1;
xs = temp.Field2;
continue; // Next iteration of while loop.
}
}
{
var temp = xs as End;
if (temp != null)
{
return y.Execute();
}
}
}
}
No recursion! You can do that, because a tail-call to yourself is just
like going around the loop again. (It's a tad complicated to implement
correctly, mind you. You can see the code above doesn't seem to match
the original *exactly*.)
The long and short of this is that "last" will run "in constant space",
whereas "length", is defined above, will blow the Haskell stack if
provided with a sufficiently large input.
There's a way around that:
length = work 0
work count list =
case list of
End -> count
Node x xs -> work (count+1) xs
It should be clear how this works: We initialise count=0, and every time
we see a Node we do count+1.
Trouble is, all that does is end up with count containing
1 + (1 + (1 + (1 + (1 + ...))))
When you try to Execute() that, it will *still* blow the stack. For this
reason, Haskell has a magical primitive named "seq":
work count list =
case list of
End -> count
Node x xs ->
let new_count = count + 1
in new_count `seq` work new_count xs
The magical "seq" causes new_count to Execute() right now, even though
it's not in a case-block. (I said there were primitives other than case
that do that. Well seq is such a primitive!) Now the thunk gets
unthunked on every pass around the loop, so there's never a stack overflow.
Fun fact: I love it when people post on StackOverflow.com to ask about
an actual stack overflow! There should be a badge for that or something...
OK, so we know what to do with "data" declarations, and we know what to
do with case-blocks (and data construction). That covers #5 and #6.
Items #1 and #2 are all about local variables. They're simple enough,
once you remember that the local variables thus declared are always thunks.
Oh, but did I mention? Haskell supports recursive declarations! :-P
A let-block looks like this:
let
variable1 = expression1
variable2 = expression2
variable3 = expression3
in expression0
That entire thing --- the whole lot --- is a single expression! It is
valid *anywhere* an expression is valid. The result of this expression
is expression0 (which usually mentions the other variables declared).
The mind-bending thing is that variable1, variable2 and variabl3 are
in-scope for the *entire* block! Most particularly, it is 100% legal to
do this:
let numbers = Node 6 numbers
in ...
So "numbers" is a variable declared... in terms of... itself...?
We can encode this in C#, if we modify the Thunk<> class a bit:
public sealed Thunk<T>
{
private Func<T> _body;
private T _result;
public Thunk(Func<T> body)
{
this._body = body;
}
public Thunk()
{
this._body = null;
}
public void Become(Func<T> body)
{
this._body = body;
}
public void Execute()
{
...
}
}
Then we can simply do this:
var numbers = new Thunk<List<int>>();
var node = new Node(6, numbers);
numbers.Become( () => node );
We create a non-functional thunk --- a "hole" --- create the value to go
in it, and finally "tie the knot" by linking the thunk back to itself.
"node" points to "numbers", "numbers" points to "node", and we have
ourselves a circular list. (Or an infinite list, if you prefer.)
In general terms, then, for each let-block, we do
var variable1 = new Thunk<...>();
var variable2 = new Thunk<...>();
var variable3 = new Thunk<...>();
variable1.Become(...compiled expression1...);
variable2.Become(...compiled expression2...);
variable3.Become(...compiled expression3...);
return ...compiled expression0...;
So we create a hole for every variable being declared, and then fill in
the variable's expressions one by one to make working thunks. (This
guarantees that the C# variables are definitely in-scope when we compile
the corresponding expressions.)
Did I say something about "don't need to care about the Haskell types"?
Oh dear, look at all those variable declarations! It seems that, for C#
at least, we *do* need to care about types. Hmm...
This now just leaves us with points #3 and #4, which are all about
functions. Well, obviously C# has built-in support for functions and
calling them. Hell, it even has built-in anonymous functions now! (E.g.,
the Func<> type.)
But remember, Haskell has "curried functions". That is, (+) is a
2-argument function, and (5+) is a 1-argument function [that adds 5 to
things], and only (5+4) is an actual function call, as such.
I chose to define a little helper class:
public sealed class Funcion<T1, T2>
{
private readonly Func<Thunk<T1>, T2> _body;
public Function(Func<Thunk<T1>, T2> body)
{
this._body = body;
}
public Thunk<T2> Call(Thunk<T1> argument)
{
return new Thunk<T2>( () => this._body(argument) );
}
}
So a Function<T1, T2> is really kind of a shorthand for Func<Thunk<T1>,
T2>. But notice the beautiful symmetry of Call(): it takes a Thunk<T1>
and gives you back a Thunk<T2>. This is *exactly* how Haskell works! The
function may or may not ever execute the thunk being passed in.
Likewise, a let-block might declare a variable that's never used [or,
more likely, never used on a certain code-path], in which case the
result thunk is never executed either.
That leaves us with just how to represent Haskell anonymous functions.
Note that a Haskell *named* function is merely a global constant who's
value happens to be an anonymous function!
The approach I went with, for named functions, is to do this:
public static int FN_Add(Thunk<int> x, Thunk<int> y)
{
return x.Execute() + y.Execute();
}
public static Thunk<Function<int, Function<int, int>>> OB_Add =
new Thunk<Function<int, Function<int, int>>>( () =>
new Function<int, Function<int, int>>( x =>
new Function<int, int>( y => FN_Add(x, y) )
)
);
It's a tad verbose (!!), but it works. The protocol is that every
top-level Haskell constant becomes a
public static Thunk<...> OB_XXX = ...
Any top-level constant that's a *function* also gets a
public static T0 FN_XXX(Thunk<T1> arg1, Thunk<T2> arg2, Thunk<T3> args)
{
...compiled code...
}
In this way, if function Foobar needs to call the add function, it can
just do Fn_Add(x, y). But if Foobar needs to call an unknown function
passed in as an argument [Haskell function do this A LOT!], it'll be given a
Thunk<Function<int, Function<int, int>>
and it'll need to do
fn.Execute().Call(x).Execute().Call(y)
to build the thunk, and another Execute() if it wants the actual, final
result.
Notice that every function argument is always a thunk. I added a special
constructor for creating "already evaluated thunks". Again, the native C
implementation doesn't need to care [because its objects can change type
dynamically], but in C# it's a necessary evil.
Looking at what we've got, accessing a variable or creating an object
are mere expressions in C#. But declaring local variables or doing
conditional jumps are *statements* at the C# level. (Yes, I could use
the ternary operator. No, I'm not going to. :-P ) So the Haskell
expressions end up being divided into "simple expressions" (Haskell
expressions which are also C# expressions) and "complex expressions"
(Haskell expressions which become *statements* in C#).
It is clear that a simple expression may only contain simple
subexpressions. But what of complex expressions?
Well, Haskell allows anything. But for the purposes of code generation,
I'm going to assume that the [non-existent] earlier stages of my
compiler have simplified things to a certain degree. In particular:
* A let-block can only bind variables to *simple* expressions. The
"in..." part is a complex expressing, since it represents the
continuation of the flow control.
* A case-block can only have a *simple* expression as the thing to
examine. The case alternatives are again complex expressions, since
they're the next stage of flow control.
I haven't decided precisely what to do with anonymous functions yet. One
possibility is "lambda lifting": turn *every* function into a top-level
function (with a randomly-generated name, and any captured local
variables as extra arguments). But that's a shame when C# has automatic
variable capture. So probably the correct thing is to treat an anonymous
function as a simple expression (and its body as a complex expression).
This makes sense, since in C# an anonymous function is an *expression*:
var fn1 = (x, y, z) = x + y + z;
In an OO language, you might perhaps have complex expressions as a
subclass of simple ones or something. But in Haskell, these are two
separate types, and to include a simple expression within a complex
expression, you have to explicitly "wrap" it. It turns out that these
wrapping points are *precisely* the points where the C# code needs a
"return" statement:
case foo of Bar -> 5
{
var temp = foo as Bar;
if (temp != null)
{
return 5;
}
}
This is a fortuitous accident indeed!
On a larger scale, a Haskell program is composed of "modules". Modules
are gathered together into "packages", which are the [usual] unit of
Haskell code distribution.
Well, it seems clear that a Haskell package should become a C# assembly!
And it seems reasonable that each module should become a C# class. So
what the code does is to turn the Haskell module "Data.List" in the
module "base" into
namespace Haskell.Packages.base.Data
{
public static class List
{
...compiled code from Data.List...
}
}
In this way, the fully-qualified name of the static class is
Haskell.Packages.base.Data.List
which seems quite nice.
While we're on the subject, Haskell allows you to give some rather
perplexing names to things. For example, there's a standard Haskell
function named ">>=". Clearly C# won't be too happy about that! We need
a name mangler! And I've got just the thing: For every character which
is not a letter, digit or underscore, replace is with "x63" (or whatever
the ASCII character code for the illegal character is).
[C# also has a rule that the *first* character of an identifier cannot
ever be a digit. Happily, Haskell has *the same* rule, so there's no
need to explicitly check for this problem.]
At this point, every Haskell package gets turned into a nice
self-contained C# DLL. A rather exciting possibility [which I haven't
actually investigated yet] is that Haskell linking information could be
stored as attributes!
For those who aren't fluent in C#: You can do stuff like
[TestCase(5)]
[TestCase(7)]
[TestCase(-1)]
public void TestWibble(int x)
{
...
The stuff in square brackets are "attributes". These get compiled into
the DLL, and can be inspected at run-time using reflection. The example
above shows how NUnit (a .NET unit test framework) figures out which
methods are unit tests, and if the tests are parameterised, what values
the test should be run with.
But attributes are *arbitrary*. You can define brand new attributes
yourself. (Although the inputs have to be *compile-time* constants, with
is intensely annoying. Post particularly, arrays are not compile-time
constants. At least, C# thinks so...)
That being the case, one might imagine a world where each compiled
Haskell function has attributes stating the original Haskell data type.
This might enable the Haskell compiler [which, I remind you, doesn't
exist] to compile code against a Haskell package for which you don't
actually have the source code, just the compiled DLL. But only if the
compiler runs under .NET, of course! (Self-hosting compiler, anyone?)
Another thing is that Haskell and C# have different ideas about
visibility. For example, Haskell allows you to have a public function
that takes or returns private types. That's a strange and rather useless
thing to do, and C# doesn't allow it.
More strangely, C# doesn't allow a public class to have a private
parent. I'm not sure why. (I would have liked to prevent someclassing,
say, BufferMode, by making it private and its subclasses public. But
that wouldn't really achieve what I want anyway, so...)
Given Haskell's supreme flexibility on visibility, it seems it might
make sense to put visibility as an attribute, rather than try to make C#
visibility match Haskell visibility.
The next fun thing is Haskell class instances. Now a Haskell "class" is
nearly identical to what C# calls an "interface".
class ToString x where
to_string :: x -> String
instance ToString Int where
to_string x = ...
instance ToString Bool where
to_string True = "True"
to_string False = "False"
You *might* think we should implement a Haskell class as a C#
interface... Uh, yeah, that's not going to work. You see, the set of
interfaces that any C# class implements is fixed. But the set of Haskell
class instances is dynamic. You can always add new instances. (E.g., I
can invent a new Haskell class, and make the primitive built-in Haskell
types implement that class!)
On top of that, Haskell class instances can live in a different module
than where the type is defined. Whether the instance is "in-scope" or
not depends on what modules you import!
This suggests that a Haskell instance should be a first-class C# object.
And, indeed, we can do something like this:
public sealed class ToString<X>
{
public readonly Thunk<Function<X, string>> OB_ToString;
}
This gets even more interesting, since Haskell lets you say things like
"for all types X that implement ToString, the type List X also
implements ToString". This suggests that such parameterised class
instances should actually become run-time C# *functions*!
public ToString<List<X>> IN_ToString<X>(ToString<X> instance)
This is a function that takes a class instance as input, and returns a
new, bigger class instance as output!
(For the most part, if your Haskell code is compiled with known types,
the compiler should hard-code the necessary method definitions for you.
This crazy dynamic class construction malarkey is only for polymorphic
functions.)
This has the nice property that it supports Haskell methods
parameterised by their *return type*! (Something that C# does *not*
support.) The canonical example being
class FromString x where
from_string :: String -> x
It also allows easy support for the non-standard but widely used Haskell
feature "multi-parameter type classes", where methods are parameterised
over *multiple* types, not just one. (Yes, I'm talking about multiple
dispatch.) Canonical example:
class Convert x y where
convert :: x -> y
instance Convert Int Double where...
instance Convert Int String where...
This defines a generic "convert" function, which can convert values
between any pair of types for which you've manually implemented a
conversion. Note that C# will let you do
T2 Convert<T1, T2>(T1 input)
but this function does "the same thing" for *all* possible types. The
Haskell example above provides a *different* implementation for every
unique pair of types. (Including forward and reverse directions ---
e.g., converting from int to double and from double to int. Indeed,
maybe you don't *want* the reverse direction, since it's potentially
lossy?...)
Post a reply to this message
|
|
|
|
On 05/03/2015 08:40 AM, scott wrote:
>>> An interesting read, so it turns out you are actually quite good at C#
>>> programming then :-)
>>
>> Well, yeah, I did just spend the last 2 years writing C# code every
>> single day. ;-)
>
> Maybe you mentioned it but I forgot, what language are you writing the
> "compiler" in then?
The compiler is written in Haskell.
It probably wouldn't be hard to write the back-end in C# either - but
the front-end would be nightmarishly horrific to write in anything less
powerful than Haskell. (And even then, it's going to be quite hard.)
If, by some miracle, the compiler ever becomes self-hosting [which I
consider extremely unlikely], then it will be possible to compile
Haskell code to C# from within C#.
Actually, it's surprising how similar the C# and JavaScript backends
have ended up being. Like, they both get driven by the same data
structures. You could realistically write a compiler that spits out one
or the other at the flick of a switch. (Assuming the front-end existed,
which it doesn't.)
> You can write C# code to generate, assemble and run
> code dynamically can't you?
You *can*, in two different ways.
There's a "code DOM" that is a fancy system for generating
syntactically-correct C# source code. (I.e., it lets you save text to a
file, and then compile it how you could compile any other C# code.) I
*think* they added the ability to programmatically compile it from
within a running C# application now.
Or you can generate raw MS CIL directly, which is instantly useable.
(But it means you have to comprehend low-level CIL stuff.)
At the moment, my computer just outputs ordinary C# source code, which
VisualStudio can then compile. (Or, more likely, complain about.)
> I started work on a raytracer once where the
> user typed in the scene as pseudo-C# code and then it compiled it an ran
> it to render the scene - it wasn't a security hole, more like a security
> chasm :-)
Sounds a bit like the self-extracting POV-Ray scene I wrote a while
back... ;-)
>> ...yeah, it turns out file_offset was declared as a 32-bit integer
>> rather than a 64-bit integer. (Because C doesn't define the size of
>> things, it's just left as implementation-dependent. Yay!)
>
> Yes, at least in C# you have Int32 Int64, and the MS IDE would probably
> give you a red squiggly line under something like that if you got it wrong.
Or Haskell, which gives you Int8, Int16, Int32, Int64 (and Integer,
actually). Or Pascal. Or Smalltalk. Or Eiffel. Or, really, just about
any programming language less ancient than C.
Considering that C was invented explicitly for the purpose of writing an
operating system, it seems weird that it doesn't define bit-sizes,
considering how utterly fundamental those are to all systems programming...
> I stopped using C++ a long time ago precisely for these sort of things
> that you miss and it screws everything up. Also I got so used to the
> autocomplete and help for C# in the MS IDE that the C++ one is painful
> to use now.
Yeah, as soon as you start editing C++ code, VisualStudio becomes a
paperweight. You can't look up definitions of anything [because that's
equivalent to solving the Halting Problem], you can't find usages of it
[same reason], you certainly can't auto-rename stuff, it's impossible to
determine which include file a certain identifier comes from or what
it's type might be... It's just a painful experience.
Oh, and just because it compiles on Windows does *not* mean it will
compile (or link) on Linux. For a start, if you are so incautious as to
add a new source file to the project, you now have to manually edit the
makefile [several times - because the first edit will never, ever be
correct]. Then there's the fact that VS accepts
template1<template2<FUBAR>>
whereas GCC demands that there must be a space (because ">>" is an
operator)... The fun never ends!
Sometimes, I like to show people the source code to our file-analysis
program, and then the corresponding 10-line program in Haskell. ;-)
Post a reply to this message
|
|