POV-Ray: Newsgroups: povray.off-topic: Enter the compiler

POV-Ray : Newsgroups : povray.off-topic : Enter the compiler		Server Time 13 Jul 2025 02:23:58 EDT (-0400)

Goto Latest 10 Messages

Next 10 Messages >>>

From: Orchid Win7 v1
Subject: Enter the compiler
Date: 3 Mar 2015 17:45:47
Message: <54f6399b@news.povray.org>

In the category of "nobody cares, but I'm going to write about it 
anyway"... A few months back I started building a Haskell to JavaScript 
"compiler". (I say "compiler" since the thing was far too incomplete to 
be described as a *real* compiler; more like a mere compiler back-end.) 
That has lain dormant for a while now, but I've just embarked upon 
writing a Haskell to C# compiler.

The hardest part about compiling Haskell is its laziness. As you know, 
in Haskell nothing actually gets executed until you *do* something with 
the result. In a low-level language like machine code (or, more 
realistically, C) this is quite hard to arrange. But you know what? Both 
JavaScript and C# actually have first-class functions that can access 
all currently in-scope local variables. *That's the hard part!* And 
these languages have it built-in!

Obviously, JavaScript can run in a web browser. That's why I embarked on 
that project in the first place. But it turns out JavaScript has another 
extremely useful property: it's not object-oriented. Or rather, it 
doesn't have classes. Objects can just dynamically change structure at 
run-time. (Although --- depending on your JS engine --- that seems to 
entail a performance hit. They tend to be optimised for the common case 
where people code as if it *were* class-based, and JS engines optimise 
for that common case.)

This turns out to be tremendously useful for Haskell; laziness is 
implemented by storing some kind of data structure representing the work 
to be done, and then overwriting it in-place with the result of that 
work when the time comes. Normal OO languages don't let an object 
suddenly change class like that, but JavaScript is fine with it.

So much for JavaScript. When I came to C#, I made an interesting 
discovery: It is possible to encode the Haskell type system into the C# 
type system. Well, almost. And it looks like it might even be relatively 
simple for C# to call Haskell and vice versa. But I'm getting ahead of 
myself...



So how do you compile Haskell? Well, first you need to understand what 
Haskell can do.

Haskell has a powerful and elaborate type system. But actually, it turns 
out that all the polymorphism is actually at compile-time (not unlike 
C++ templates --- although the implementation is totally different). 
Haskell has complete "type erasure" --- meaning that at run-time, the 
system does not know or care what type anything is. (C and Pascal also 
have this property, for example.) In summary, if we pretend that some 
earlier stage of the compiler has already type-checked the program and 
decided that it's legal, we can completely ignore Haskell types from now on.

So that leaves us with Haskell expressions. Now Haskell has a seemingly 
vast amount of syntax to it, but actually the entire language "desugars" 
down to just 6 fundamental constructs. In an expression, you can:

1. Declare new local variables.
2. Dereference any in-scope variable.
3. Declare an anonymous function.
4. Call a function with some arguments.
5. Construct a data structure.
6. Inspect and extract data from a data structure.

Some of these are self-explanatory. Some of them require a little 
further explanation.



Suppose we have a Haskell declaration such as

   data Point = Point Int Int

We can easily encode this in C# as

   public sealed class Point
   {
     public readonly int X, Y;

     public Point(int x, int y)
     {
       this.X = x;
       this.Y = y;
     }
   }

Point #5 above means that in Haskell I can write

   Point 5 7

to construct a Point value. The corresponding C# code is pretty obviously

   new Point(5, 7)

So that's easy enough. Similarly, point #6 says that I can write

   distance :: Point -> Int
   distance (Point x y) = sqrt (x*x + y*y)

Again, an obvious C# analogue is

   int distance(Point p)
   {
     return p.X*p.X + p.Y*p.Y;
   }

To really understand 5 and 6, you need to understand Haskell's algebraic 
data types (ADTs). [Note that ADT also stands for *abstract* data type, 
which is something very different!] Consider the following declaration:

   data BufferMode = Unbuffered | LineBuffered | BufferSize Int

Is it an enum? Is it a struct? No, it's a discriminated union! Roughly, 
it says that if a variable is declared as type "BufferMode", then legal 
values for that variable are "Unbuffered", "LineBuffered", or 
"BufferSize" followed by an Int (e.g., "BufferSize 7").

We can encode that in C# in the following slightly strange manner:

   public abstract class BufferMode
   {...}

   public sealed class Unbuffered : BufferMode
   {...}

   public sealed class LineBuffered : BufferMode
   {...}

   public sealed class BufferSize : BufferMode
   {
     public readonly int Field1;
     ...
   }

In other words, we make an abstract class with nothing in it, and 
concrete classes for each "shape" of value that Haskell permits. We then 
declare every Haskell value of type BufferMode as a C# value of type 
BufferMode.

So that covers #5, since we can do "new BufferSize(1024)" for example. 
But #6 is where things get... interesting.

Most normal programming languages have an if/then/else statement. Some 
of them also have a switch/case construct, which is syntax sugar for a 
big, complicated if/then/elseif block.

Haskell has an if/then/else *expression* (which works like the C# 
ternary operator). It also has a construct called a case-block. And 
if/then/else is syntax sugar for a case-block. *Not* the other way around.

   to_string :: BufferMode -> String
   to_string mode =
     case mode of
       Unbuffered   -> "No buffering"
       LineBuffered -> "Buffer each line"
       BufferSize n -> "Buffer every " ++ show n ++ " bytes"

Take note of the final line: Here we declare a new local variable "n" 
and bind its value to the Int field of BufferSize. This variable is 
*only* in-scope on the final line (since otherwise the Int field doesn't 
exist).

We are actually doing 3 separate things here:

* If "mode" is an unevaluated expression, evaluate it *now*!
* Take a conditional branch depending on which sort of value mode is.
* Extract any data fields in mode and bind them to variables.

A case-block is the only Haskell construct that causes execution to 
actually occur (i.e., it defeats laziness). It is the only Haskell 
construct for explicitly performing conditional branching. And it is the 
only Haskell construct that can extract fields from data structures.

[Actually, some of the above isn't 100% true. There are a number of 
primitive operations that *also* provoke execution --- most 
conspicuously, any kind of I/O. You can also do conditional execution 
based on the condition "what function is in f?" by just calling f with 
some arguments.]

As you can see, the case-block is a fundamental Haskell primitive. If we 
can implement this, we're in good shape!

One possibility is to encode each Haskell function as a C# method on the 
class hierarchy. In other words, add an abstract ToString() method to 
BufferMode, with concrete implementations for each of the subclasses.

That doesn't work well, though. Haskell is "factored the other way" 
compared to OO languages:

* OO languages give you an extensible set of data structures with a 
fixed set of operations. [You can always add a new subclass, but you 
can't really add new methods.]

* Haskell gives you a fixed set of data structures, with an extensible 
set of operations. [You can always add new functions that operate on 
BufferMode. You can never add a new kind of BufferMode.]

These differences make sense for different use-cases. If you're building 
a widget library, of *course* you always want to be able to add new 
kinds of widget! But the operations available on any arbitrary widget 
are more or less fixed. If, on the other hand, you're building an 
abstract syntax tree, the possible tree nodes are fixed by the language 
you're trying to process; if that language changes, you need to redesign 
your whole program *anyway*! But the set of operations you might want to 
do on that syntax is always expanding...

Fortunately, another possible encoding in C# looks like this:

   string ToString(BufferMode mode)
   {
     {
       var temp = mode as Unbuffered;
       if (temp != null)
       {
         return "No buffering";
       }
     }

     {
       var temp = mode as LineBuffered;
       if (temp != null)
       {
         return "Buffer each line";
       }
     }

     {
       var temp = mode as BufferSize;
       if (temp != null)
       {
         return "Buffer every " + temp.Field1 + " bytes";
       }
     }

     throw new Exception("Unreachable code");
   }

For those of you who don't speak C# fluently:

* The explicit case (Foo)bar attempts to convert the object pointed to 
be bar into class Foo. If this fails, a run-time exception is thrown. We 
could catch and ignore that, but exception handling is quite slow.

* The "as" operator in "bar as Foo" attempts to cast bar to type Foo, 
but if it fails it just returns null (rather than throwing an exception).

* There is also an "is" operator. We could just do "if (mode is ...)". 
But note the final case, where we actually want to *do* something with 
the casted result.

* The strange extra curly brackets around each if allow us to reuse the 
variable name "temp" with a different type signature each time. (Since 
it's a local variable, it's only in-scope until the closing brace.)



We're doing great here so far. Oh, except for one thing: the mode might 
be unevaluated. We need to make sure it gets evaluated as the first 
step. So how do we store unevaluated data in C#?

Well, whichever way you slice it, it's probably going to require a 
visible type. The Haskell term for something we haven't evaluated yet is 
a "thunk". So we're going to have something like

   public sealed class Thunk<T>
   {
     public T Execute() {...}
   }

The idea here is that the first time we call Execute(), it executes the 
enclosed action, and returns us the result. On subsequent calls to 
Execute(), it just immediately returns the previously calculated result. 
It's mildly tricky to make this thread-safe, but not that hard. 
Basically, at the heart of a Thunk<T> has a C# Func<T> (which is a 
pre-defined C# type). For example,

   var t = new Thunk<int>( () => DoLongComputation(3, 7, 59) );

When we do t.Execute(), it will run DoLongComputation() with the 
specified arguments. So that's easy enough.

Now we can see that our ToString() function needs to change a bit:

   string ToString(Thunk<BufferMode> t_mode)
   {
     var mode = t_mode.Execute();
     ...
   }

The rest of the code is as before. But wait! It turns out we need to 
change a bit more. The definition for BufferSize should actually be

   public sealed class BufferSize : BufferMode
   {
     public readonly Thunk<int> Field1;
     ...
   }

Instead of the field being an int, it's actually a Thunk<int>. Because 
that's how Haskell works: every data structure field is potentially 
unevaluated.

Many people think of a datum as either "evaluated" or "unevaluated". But 
consider the Haskell list type:

   data List x = Node x (List x) (List x) | End

   ---

   public abstract class List<T>
   {
     ...
   }

   public sealed class Node<T> : List<T>
   {
     public readonly Thunk<T>       Field1;
     public readonly Thunk<List<T>> Field2;
     ...
   }

   public sealed class End<T> : List<T>
   {
     ...
   }

Now we can see that if I hand you a Thunk<List<T>>, the whole thing 
could be unevaluated. Or the first node of the list could have Field1 
evaluated and Field2 not evaluated. Or the other way around. You can see 
now that a list can be in many, many stages of partial evaluation.

You may *also* have noticed how Haskell's polymorphic list type maps 
beautifully to a C# generic type. List x simply becomes List<T>.

Returning to our ToString() function, we need to change the following:

   {
     var temp = mode as BufferSize;
     if (temp != null)
     {
       return "Buffer every " + temp.Field1.Execute() + " bytes";
     }
   }

So we need to execute Field1 in case it wasn't already executed.

Indeed, even if it *was* executed, we still need to call Execute() to 
fetch the result. Ideally called Execute() would take a Thunk<int> 
object and overwrite it in-place with a boxed int value. But C# doesn't 
let objects suddenly change class like that. (Obviously the native 
Haskell implementation is written in C and has no such qualms.)

In a way, though, it's kind of illuminating to actually *see* where the 
thunks are. For your entertainment, here is the Haskell map function:

   map :: (x -> y) -> List x -> List y
   map f xs =
     case xs of
      Node y ys -> Node (f y) (map f ys)
      End       -> End

   --

   List<T2> Map<T1, T2>(Func<Thunk<T1>, T2> f, Thunk<List<T1>> t_xs)
   {
     var xs = t_xs.Execute();

     {
       var temp = xs as End;
       if (temp != null)
       {
         return new End<T2>(); // Notice different type parameter.
       }
     }

     {
       var temp = xs as Node;
       if (temp != null)
       {
         var z  = new Thunk<T2>( () => f(temp.Field1) );
         var zs = new Thunk<List<T2>>( () => Map(f, temp.Field2) );
         return new Node<T2>(z, zs);
       }
     }
   }

Notice how we never actually call Execute() on Field1 or Field2. We pass 
them along as potentially unevaluated thunks (indeed, enclosed in now, 
bigger thunks). You can see from the original Haskell source code that 
this function is recursive. However, encoded in C#, you can see that the 
function never *directly* calls itself. It merely builds a new thunk for 
the recursive call. If that thunk never hits Execute(), the recursive 
call never happens.

And that, ladies and gentleman, is why you can Map() an infinite list 
and have it return in finite time. Indeed, that's why you can construct 
an infinite list in the first place!

Now consider the length function:

   length :: List x -> Int
   length xs =
     case xs of
       Node y ys -> 1 + length ys
       End       -> 0

   ---

   int Length<T>(Thunk<List<T>> t_list)
   {
     var list = t_list.Execute();

     {
       var temp = list as End;
       if (temp != null)
       {
         return 0;
       }
     }

     {
       var temp = list as Node;
       if (temp != null)
       {
         return 1 + Length(temp.Field2);
       }
     }
   }

Immediately we see a problem here. See that recursive call right at the 
bottom? Yeah, Haskell has the so-called "tail-call optimisation". But 
that final call is *not* a tail-call. *This* is a tail-call:

   last :: List x -> x
   last xs =
     case xs of
       End       -> error "no last element"
       Node y ys ->
         case ys of
           End      -> y
           Node _ _ -> last ys

Notice how in that last line, the *last* thing that happens is the 
recursive call. By contrast, length doesn't call length. It calls 1 + 
length. So "+" is the thing being tail-called, not "length".

Now if it's a tail-call, you can do this:

   T Last<T>(Thunk<List<T>> t_xs)
   {
     var xs = t_xs.Execute();

     {
       var temp = xs as End;
       if (temp != null)
       {
         throw new Exception("no last element");
       }
     }

     var y = (xs as Node).Field1; // Cast cannot fail.

     while (true) // Exit condition is further down.
     {
       {
         var temp = xs as Node;
         if (temp != null)
         {
           y  = temp.Field1;
           xs = temp.Field2;
           continue; // Next iteration of while loop.
         }
       }

       {
         var temp = xs as End;
         if (temp != null)
         {
           return y.Execute();
         }
       }
     }
   }

No recursion! You can do that, because a tail-call to yourself is just 
like going around the loop again. (It's a tad complicated to implement 
correctly, mind you. You can see the code above doesn't seem to match 
the original *exactly*.)

The long and short of this is that "last" will run "in constant space", 
whereas "length", is defined above, will blow the Haskell stack if 
provided with a sufficiently large input.

There's a way around that:

   length = work 0

   work count list =
     case list of
       End       -> count
       Node x xs -> work (count+1) xs

It should be clear how this works: We initialise count=0, and every time 
we see a Node we do count+1.

Trouble is, all that does is end up with count containing

   1 + (1 + (1 + (1 + (1 + ...))))

When you try to Execute() that, it will *still* blow the stack. For this 
reason, Haskell has a magical primitive named "seq":

   work count list =
     case list of
       End -> count
       Node x xs ->
        let new_count = count + 1
        in  new_count `seq` work new_count xs

The magical "seq" causes new_count to Execute() right now, even though 
it's not in a case-block. (I said there were primitives other than case 
that do that. Well seq is such a primitive!) Now the thunk gets 
unthunked on every pass around the loop, so there's never a stack overflow.

Fun fact: I love it when people post on StackOverflow.com to ask about 
an actual stack overflow! There should be a badge for that or something...



OK, so we know what to do with "data" declarations, and we know what to 
do with case-blocks (and data construction). That covers #5 and #6. 
Items #1 and #2 are all about local variables. They're simple enough, 
once you remember that the local variables thus declared are always thunks.

Oh, but did I mention? Haskell supports recursive declarations! :-P

A let-block looks like this:

   let
     variable1 = expression1
     variable2 = expression2
     variable3 = expression3
   in expression0

That entire thing --- the whole lot --- is a single expression! It is 
valid *anywhere* an expression is valid. The result of this expression 
is expression0 (which usually mentions the other variables declared). 
The mind-bending thing is that variable1, variable2 and variabl3 are 
in-scope for the *entire* block! Most particularly, it is 100% legal to 
do this:

   let numbers = Node 6 numbers
   in  ...

So "numbers" is a variable declared... in terms of... itself...?

We can encode this in C#, if we modify the Thunk<> class a bit:

   public sealed Thunk<T>
   {
     private Func<T> _body;
     private T _result;

     public Thunk(Func<T> body)
     {
       this._body = body;
     }

     public Thunk()
     {
       this._body = null;
     }

     public void Become(Func<T> body)
     {
       this._body = body;
     }

     public void Execute()
     {
       ...
     }
   }

Then we can simply do this:

   var numbers = new Thunk<List<int>>();
   var node = new Node(6, numbers);
   numbers.Become( () => node );

We create a non-functional thunk --- a "hole" --- create the value to go 
in it, and finally "tie the knot" by linking the thunk back to itself. 
"node" points to "numbers", "numbers" points to "node", and we have 
ourselves a circular list. (Or an infinite list, if you prefer.)

In general terms, then, for each let-block, we do

   var variable1 = new Thunk<...>();
   var variable2 = new Thunk<...>();
   var variable3 = new Thunk<...>();
   variable1.Become(...compiled expression1...);
   variable2.Become(...compiled expression2...);
   variable3.Become(...compiled expression3...);
   return ...compiled expression0...;

So we create a hole for every variable being declared, and then fill in 
the variable's expressions one by one to make working thunks. (This 
guarantees that the C# variables are definitely in-scope when we compile 
the corresponding expressions.)

Did I say something about "don't need to care about the Haskell types"? 
Oh dear, look at all those variable declarations! It seems that, for C# 
at least, we *do* need to care about types. Hmm...



This now just leaves us with points #3 and #4, which are all about 
functions. Well, obviously C# has built-in support for functions and 
calling them. Hell, it even has built-in anonymous functions now! (E.g., 
the Func<> type.)

But remember, Haskell has "curried functions". That is, (+) is a 
2-argument function, and (5+) is a 1-argument function [that adds 5 to 
things], and only (5+4) is an actual function call, as such.

I chose to define a little helper class:

   public sealed class Funcion<T1, T2>
   {
     private readonly Func<Thunk<T1>, T2> _body;

     public Function(Func<Thunk<T1>, T2> body)
     {
       this._body = body;
     }

     public Thunk<T2> Call(Thunk<T1> argument)
     {
       return new Thunk<T2>( () => this._body(argument) );
     }
   }

So a Function<T1, T2> is really kind of a shorthand for Func<Thunk<T1>, 
T2>. But notice the beautiful symmetry of Call(): it takes a Thunk<T1> 
and gives you back a Thunk<T2>. This is *exactly* how Haskell works! The 
function may or may not ever execute the thunk being passed in. 
Likewise, a let-block might declare a variable that's never used [or, 
more likely, never used on a certain code-path], in which case the 
result thunk is never executed either.

That leaves us with just how to represent Haskell anonymous functions. 
Note that a Haskell *named* function is merely a global constant who's 
value happens to be an anonymous function!

The approach I went with, for named functions, is to do this:

   public static int FN_Add(Thunk<int> x, Thunk<int> y)
   {
     return x.Execute() + y.Execute();
   }

   public static Thunk<Function<int, Function<int, int>>> OB_Add =
     new Thunk<Function<int, Function<int, int>>>( () =>
       new Function<int, Function<int, int>>( x =>
         new Function<int, int>( y => FN_Add(x, y) )
       )
     );

It's a tad verbose (!!), but it works. The protocol is that every 
top-level Haskell constant becomes a

    public static Thunk<...> OB_XXX = ...

Any top-level constant that's a *function* also gets a

   public static T0 FN_XXX(Thunk<T1> arg1, Thunk<T2> arg2, Thunk<T3> args)
   {
      ...compiled code...
   }

In this way, if function Foobar needs to call the add function, it can 
just do Fn_Add(x, y). But if Foobar needs to call an unknown function 
passed in as an argument [Haskell function do this A LOT!], it'll be given a

   Thunk<Function<int, Function<int, int>>

and it'll need to do

   fn.Execute().Call(x).Execute().Call(y)

to build the thunk, and another Execute() if it wants the actual, final 
result.

Notice that every function argument is always a thunk. I added a special 
constructor for creating "already evaluated thunks". Again, the native C 
implementation doesn't need to care [because its objects can change type 
dynamically], but in C# it's a necessary evil.



Looking at what we've got, accessing a variable or creating an object 
are mere expressions in C#. But declaring local variables or doing 
conditional jumps are *statements* at the C# level. (Yes, I could use 
the ternary operator. No, I'm not going to. :-P ) So the Haskell 
expressions end up being divided into "simple expressions" (Haskell 
expressions which are also C# expressions) and "complex expressions" 
(Haskell expressions which become *statements* in C#).

It is clear that a simple expression may only contain simple 
subexpressions. But what of complex expressions?

Well, Haskell allows anything. But for the purposes of code generation, 
I'm going to assume that the [non-existent] earlier stages of my 
compiler have simplified things to a certain degree. In particular:

* A let-block can only bind variables to *simple* expressions. The 
"in..." part is a complex expressing, since it represents the 
continuation of the flow control.

* A case-block can only have a *simple* expression as the thing to 
examine. The case alternatives are again complex expressions, since 
they're the next stage of flow control.

I haven't decided precisely what to do with anonymous functions yet. One 
possibility is "lambda lifting": turn *every* function into a top-level 
function (with a randomly-generated name, and any captured local 
variables as extra arguments). But that's a shame when C# has automatic 
variable capture. So probably the correct thing is to treat an anonymous 
function as a simple expression (and its body as a complex expression). 
This makes sense, since in C# an anonymous function is an *expression*:

   var fn1 = (x, y, z) = x + y + z;

In an OO language, you might perhaps have complex expressions as a 
subclass of simple ones or something. But in Haskell, these are two 
separate types, and to include a simple expression within a complex 
expression, you have to explicitly "wrap" it. It turns out that these 
wrapping points are *precisely* the points where the C# code needs a 
"return" statement:

   case foo of Bar -> 5

   {
     var temp = foo as Bar;
     if (temp != null)
     {
       return 5;
     }
   }

This is a fortuitous accident indeed!



On a larger scale, a Haskell program is composed of "modules". Modules 
are gathered together into "packages", which are the [usual] unit of 
Haskell code distribution.

Well, it seems clear that a Haskell package should become a C# assembly! 
And it seems reasonable that each module should become a C# class. So 
what the code does is to turn the Haskell module "Data.List" in the 
module "base" into

   namespace Haskell.Packages.base.Data
   {
     public static class List
     {
       ...compiled code from Data.List...
     }
   }

In this way, the fully-qualified name of the static class is

   Haskell.Packages.base.Data.List

which seems quite nice.

While we're on the subject, Haskell allows you to give some rather 
perplexing names to things. For example, there's a standard Haskell 
function named ">>=". Clearly C# won't be too happy about that! We need 
a name mangler! And I've got just the thing: For every character which 
is not a letter, digit or underscore, replace is with "x63" (or whatever 
the ASCII character code for the illegal character is).

[C# also has a rule that the *first* character of an identifier cannot 
ever be a digit. Happily, Haskell has *the same* rule, so there's no 
need to explicitly check for this problem.]

At this point, every Haskell package gets turned into a nice 
self-contained C# DLL. A rather exciting possibility [which I haven't 
actually investigated yet] is that Haskell linking information could be 
stored as attributes!

For those who aren't fluent in C#: You can do stuff like

   [TestCase(5)]
   [TestCase(7)]
   [TestCase(-1)]
   public void TestWibble(int x)
   {
     ...

The stuff in square brackets are "attributes". These get compiled into 
the DLL, and can be inspected at run-time using reflection. The example 
above shows how NUnit (a .NET unit test framework) figures out which 
methods are unit tests, and if the tests are parameterised, what values 
the test should be run with.

But attributes are *arbitrary*. You can define brand new attributes 
yourself. (Although the inputs have to be *compile-time* constants, with 
is intensely annoying. Post particularly, arrays are not compile-time 
constants. At least, C# thinks so...)

That being the case, one might imagine a world where each compiled 
Haskell function has attributes stating the original Haskell data type. 
This might enable the Haskell compiler [which, I remind you, doesn't 
exist] to compile code against a Haskell package for which you don't 
actually have the source code, just the compiled DLL. But only if the 
compiler runs under .NET, of course! (Self-hosting compiler, anyone?)

Another thing is that Haskell and C# have different ideas about 
visibility. For example, Haskell allows you to have a public function 
that takes or returns private types. That's a strange and rather useless 
thing to do, and C# doesn't allow it.

More strangely, C# doesn't allow a public class to have a private 
parent. I'm not sure why. (I would have liked to prevent someclassing, 
say, BufferMode, by making it private and its subclasses public. But 
that wouldn't really achieve what I want anyway, so...)

Given Haskell's supreme flexibility on visibility, it seems it might 
make sense to put visibility as an attribute, rather than try to make C# 
visibility match Haskell visibility.



The next fun thing is Haskell class instances. Now a Haskell "class" is 
nearly identical to what C# calls an "interface".

   class ToString x where
     to_string :: x -> String

   instance ToString Int where
     to_string x = ...

   instance ToString Bool where
     to_string True  = "True"
     to_string False = "False"

You *might* think we should implement a Haskell class as a C# 
interface... Uh, yeah, that's not going to work. You see, the set of 
interfaces that any C# class implements is fixed. But the set of Haskell 
class instances is dynamic. You can always add new instances. (E.g., I 
can invent a new Haskell class, and make the primitive built-in Haskell 
types implement that class!)

On top of that, Haskell class instances can live in a different module 
than where the type is defined. Whether the instance is "in-scope" or 
not depends on what modules you import!

This suggests that a Haskell instance should be a first-class C# object. 
And, indeed, we can do something like this:

   public sealed class ToString<X>
   {
     public readonly Thunk<Function<X, string>> OB_ToString;
   }

This gets even more interesting, since Haskell lets you say things like 
"for all types X that implement ToString, the type List X also 
implements ToString". This suggests that such parameterised class 
instances should actually become run-time C# *functions*!

   public ToString<List<X>> IN_ToString<X>(ToString<X> instance)

This is a function that takes a class instance as input, and returns a 
new, bigger class instance as output!

(For the most part, if your Haskell code is compiled with known types, 
the compiler should hard-code the necessary method definitions for you. 
This crazy dynamic class construction malarkey is only for polymorphic 
functions.)

This has the nice property that it supports Haskell methods 
parameterised by their *return type*! (Something that C# does *not* 
support.) The canonical example being

   class FromString x where
     from_string :: String -> x

It also allows easy support for the non-standard but widely used Haskell 
feature "multi-parameter type classes", where methods are parameterised 
over *multiple* types, not just one. (Yes, I'm talking about multiple 
dispatch.) Canonical example:

   class Convert x y where
     convert :: x -> y

   instance Convert Int Double where...

   instance Convert Int String where...

This defines a generic "convert" function, which can convert values 
between any pair of types for which you've manually implemented a 
conversion. Note that C# will let you do

   T2 Convert<T1, T2>(T1 input)

but this function does "the same thing" for *all* possible types. The 
Haskell example above provides a *different* implementation for every 
unique pair of types. (Including forward and reverse directions --- 
e.g., converting from int to double and from double to int. Indeed, 
maybe you don't *want* the reverse direction, since it's potentially 
lossy?...)

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Enter the compiler
Date: 3 Mar 2015 17:48:15
Message: <54f63a2f$1@news.povray.org>

On 03/03/2015 10:45 PM, Orchid Win7 v1 wrote:
> In the category of "nobody cares, but I'm going to write about it
> anyway"...

#1: Oh crap, I hit send.

#2: Holy cow, it's QUARTER TO ELEVEN AT NIGHT!! What am I DOING here?!

Post a reply to this message

From: clipka
Subject: Re: Enter the compiler
Date: 3 Mar 2015 21:03:08
Message: <54f667dc$1@news.povray.org>

Am 03.03.2015 um 23:48 schrieb Orchid Win7 v1:
> On 03/03/2015 10:45 PM, Orchid Win7 v1 wrote:
>> In the category of "nobody cares, but I'm going to write about it
>> anyway"...
>
> #1: Oh crap, I hit send.
>
> #2: Holy cow, it's QUARTER TO ELEVEN AT NIGHT!! What am I DOING here?!

Going to bed early? ;)

Post a reply to this message

From: scott
Subject: Re: Enter the compiler
Date: 4 Mar 2015 04:45:30
Message: <54f6d43a$1@news.povray.org>

> In the category of "nobody cares, but I'm going to write about it
> anyway"... A few months back I started building a Haskell to JavaScript
> "compiler". (I say "compiler" since the thing was far too incomplete to
> be described as a *real* compiler; more like a mere compiler back-end.)
> That has lain dormant for a while now, but I've just embarked upon
> writing a Haskell to C# compiler.

An interesting read, so it turns out you are actually quite good at C# 
programming then :-) I bet you not many applicants for *any* programming 
jobs could manage what you've described here.

> So that's easy enough. Similarly, point #6 says that I can write
>
>    distance :: Point -> Int
>    distance (Point x y) = sqrt (x*x + y*y)
>
> Again, an obvious C# analogue is
>
>    int distance(Point p)
>    {
>      return p.X*p.X + p.Y*p.Y;
>    }

I think you're missing something there :-)

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Enter the compiler
Date: 4 Mar 2015 16:19:06
Message: <54f776ca@news.povray.org>

On 04/03/2015 09:45 AM, scott wrote:
> An interesting read, so it turns out you are actually quite good at C#
> programming then :-)

Well, yeah, I did just spend the last 2 years writing C# code every 
single day. ;-)

C++, however, still scares me a little bit. For example, consider the 
following:

   std::stringstream buf(user_data);
   buf >> file_offset;

...yeah, it turns out file_offset was declared as a 32-bit integer 
rather than a 64-bit integer. (Because C doesn't define the size of 
things, it's just left as implementation-dependent. Yay!)

Now you would *think* that if the file size doesn't fit into an integer, 
one of two things would happen:

1. It's C. Ignore the problem and return garbage data.
2. It's C++. Throw an exception to say something went wrong.

Naturally, C++ *actually* does neither of these. Instead, it silently 
*does nothing*. As in, file_offset is left uninitialised, with garbage 
data in it. Apparently there's some hidden field that none of us knew 
about that you're supposed to remember to manually check for errors. 
After every single conversion operation. (Because *that* isn't 
error-prone at all...)

> I bet you not many applicants for *any* programming
> jobs could manage what you've described here.

Uh... from what I've seen, most applicants can't implement FizzBuzz. 
(Which, I remind you, is why FizzBuzz even *exists* to begin with!)

Anyway, before you worship me too greatly, let's see if the code my 
program produces actually *runs! ;-) It's fine spitting out a text file 
that *looks* like it should totally work correctly. It's quite another 
to actually run some unit tests and see them all pass. :-P

>> So that's easy enough. Similarly, point #6 says that I can write
>>
>> distance :: Point -> Int
>> distance (Point x y) = sqrt (x*x + y*y)
>>
>> Again, an obvious C# analogue is
>>
>> int distance(Point p)
>> {
>> return p.X*p.X + p.Y*p.Y;
>> }
>
> I think you're missing something there :-)

NAAAARHG! >_<

Sometimes I honestly wonder if I should ever have been released into the 
community. Did you know, the other day I accidentally inhaled my own 
saliva instead of swallowing it? How can anybody be this retarded??

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Enter the compiler
Date: 4 Mar 2015 17:16:24
Message: <54f78438$1@news.povray.org>

On 04/03/2015 09:45 AM, scott wrote:
> An interesting read, so it turns out you are actually quite good at C#
> programming then :-) I bet you not many applicants for *any* programming
> jobs could manage what you've described here.

As an aside... If I wrote something like

   True  && x = x
   False && x = False

it looks almost like magic. Writing it out as

   public static Bool FN_And(Thunk<Bool> a, Thunk<bool> b)
   {
     var left = a.Execute();
     if (left)
     {
       return b;
     }
     else
     {
       return new Thunk<Bool>(false);
     }
   }

is a damned-site more verbose, but arguably it's more explicit what the 
hell the code actually *does*.

Or am I just imagining it?

One of the things Warp complained about a lot was Haskell's execution 
model being utterly incomprehensible. Does spelling it all out 
explicitly like this actually make it any clearer what the machine is 
doing? Or is it just so much extra signal noise?

(The *other* thing Warp complained about was being unable to mentally 
parse Haskell's syntax. Which I guess is valid... Making a parser that 
handles 70% of Haskell's syntax is easy. Making one that handles 100% of 
it is *really* hard. Even GHC deviates from the official language 
specification slightly. The specification is ambiguous in a few obscure 
places...)

Post a reply to this message

From: Thomas de Groot
Subject: Re: Enter the compiler
Date: 5 Mar 2015 03:30:23
Message: <54f8141f$1@news.povray.org>

On 4-3-2015 22:18, Orchid Win7 v1 wrote:
> Sometimes I honestly wonder if I should ever have been released into the
> community. Did you know, the other day I accidentally inhaled my own
> saliva instead of swallowing it? How can anybody be this retarded??

Oh well, been there, done that... :-)

At some point, we have all been loose at large far too long for our own 
or for other's good. ;-)
-- 
Thomas

Post a reply to this message

From: scott
Subject: Re: Enter the compiler
Date: 5 Mar 2015 03:41:00
Message: <54f8169c$1@news.povray.org>

>> An interesting read, so it turns out you are actually quite good at C#
>> programming then :-)
>
> Well, yeah, I did just spend the last 2 years writing C# code every
> single day. ;-)

Maybe you mentioned it but I forgot, what language are you writing the 
"compiler" in then? You can write C# code to generate, assemble and run 
code dynamically can't you? I started work on a raytracer once where the 
user typed in the scene as pseudo-C# code and then it compiled it an ran 
it to render the scene - it wasn't a security hole, more like a security 
chasm :-)

> C++, however, still scares me a little bit. For example, consider the
> following:
>
>    std::stringstream buf(user_data);
>    buf >> file_offset;
>
> ...yeah, it turns out file_offset was declared as a 32-bit integer
> rather than a 64-bit integer. (Because C doesn't define the size of
> things, it's just left as implementation-dependent. Yay!)

Yes, at least in C# you have Int32 Int64, and the MS IDE would probably 
give you a red squiggly line under something like that if you got it wrong.

I stopped using C++ a long time ago precisely for these sort of things 
that you miss and it screws everything up. Also I got so used to the 
autocomplete and help for C# in the MS IDE that the C++ one is painful 
to use now.

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Enter the compiler
Date: 5 Mar 2015 13:29:09
Message: <54f8a075$1@news.povray.org>

On 05/03/2015 08:40 AM, scott wrote:
>>> An interesting read, so it turns out you are actually quite good at C#
>>> programming then :-)
>>
>> Well, yeah, I did just spend the last 2 years writing C# code every
>> single day. ;-)
>
> Maybe you mentioned it but I forgot, what language are you writing the
> "compiler" in then?

The compiler is written in Haskell.

It probably wouldn't be hard to write the back-end in C# either - but 
the front-end would be nightmarishly horrific to write in anything less 
powerful than Haskell. (And even then, it's going to be quite hard.)

If, by some miracle, the compiler ever becomes self-hosting [which I 
consider extremely unlikely], then it will be possible to compile 
Haskell code to C# from within C#.

Actually, it's surprising how similar the C# and JavaScript backends 
have ended up being. Like, they both get driven by the same data 
structures. You could realistically write a compiler that spits out one 
or the other at the flick of a switch. (Assuming the front-end existed, 
which it doesn't.)

> You can write C# code to generate, assemble and run
> code dynamically can't you?

You *can*, in two different ways.

There's a "code DOM" that is a fancy system for generating 
syntactically-correct C# source code. (I.e., it lets you save text to a 
file, and then compile it how you could compile any other C# code.) I 
*think* they added the ability to programmatically compile it from 
within a running C# application now.

Or you can generate raw MS CIL directly, which is instantly useable. 
(But it means you have to comprehend low-level CIL stuff.)

At the moment, my computer just outputs ordinary C# source code, which 
VisualStudio can then compile. (Or, more likely, complain about.)

> I started work on a raytracer once where the
> user typed in the scene as pseudo-C# code and then it compiled it an ran
> it to render the scene - it wasn't a security hole, more like a security
> chasm :-)

Sounds a bit like the self-extracting POV-Ray scene I wrote a while 
back... ;-)

>> ...yeah, it turns out file_offset was declared as a 32-bit integer
>> rather than a 64-bit integer. (Because C doesn't define the size of
>> things, it's just left as implementation-dependent. Yay!)
>
> Yes, at least in C# you have Int32 Int64, and the MS IDE would probably
> give you a red squiggly line under something like that if you got it wrong.

Or Haskell, which gives you Int8, Int16, Int32, Int64 (and Integer, 
actually). Or Pascal. Or Smalltalk. Or Eiffel. Or, really, just about 
any programming language less ancient than C.

Considering that C was invented explicitly for the purpose of writing an 
operating system, it seems weird that it doesn't define bit-sizes, 
considering how utterly fundamental those are to all systems programming...

> I stopped using C++ a long time ago precisely for these sort of things
> that you miss and it screws everything up. Also I got so used to the
> autocomplete and help for C# in the MS IDE that the C++ one is painful
> to use now.

Yeah, as soon as you start editing C++ code, VisualStudio becomes a 
paperweight. You can't look up definitions of anything [because that's 
equivalent to solving the Halting Problem], you can't find usages of it 
[same reason], you certainly can't auto-rename stuff, it's impossible to 
determine which include file a certain identifier comes from or what 
it's type might be... It's just a painful experience.

Oh, and just because it compiles on Windows does *not* mean it will 
compile (or link) on Linux. For a start, if you are so incautious as to 
add a new source file to the project, you now have to manually edit the 
makefile [several times - because the first edit will never, ever be 
correct]. Then there's the fact that VS accepts

   template1<template2<FUBAR>>

whereas GCC demands that there must be a space (because ">>" is an 
operator)... The fun never ends!

Sometimes, I like to show people the source code to our file-analysis 
program, and then the corresponding 10-line program in Haskell. ;-)

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Enter the compiler
Date: 5 Mar 2015 13:37:20
Message: <54f8a260$1@news.povray.org>

On 03/03/2015 10:45 PM, Orchid Win7 v1 wrote:
> It's a tad verbose (!!), but it works. The protocol is that every
> top-level Haskell constant becomes a
>
> public static Thunk<...> OB_XXX = ...
>
> Any top-level constant that's a *function* also gets a
>
> public static T0 FN_XXX(Thunk<T1> arg1, Thunk<T2> arg2, Thunk<T3> args)
> {
> ...compiled code...
> }

...yeah, that doesn't actually work.

Haskell allows you to declare a "constant" who's type is polymorphic. C# 
does not.

For example:

   data MinHeap x = Empty | Node x (MinHeap x) (MinHeap x)

   empty :: MinHeap x
   empty = Empty

You *think* you can do this:

   public abstract class TC_MinHeap<Tx> {...}

   public sealed class VC_Empty<Tx> : TC_MinHeap<Tx> {...}

   public sealed class VC_Node<Tx> : TC_MinHeap<Tx> {...}

   public static readonly TC_MinHeap<Tx> OB_empty = new VC_Empty<Tx>();

Alas, that won't work. Outside of VC_Empty, the Tx variable is 
out-of-scope. The only way is to create a generic *method*:

   public static TC_MinHeap<Tx> OB_empty<Tx>()
   {
     return new VC_Empty<Tx>();
   }

Annoyingly, that means creating a brand-new VC_Empty<Tx> object for 
every possible type Tx. [And possibly several identical ones for a given 
type, if OB_empty gets accessed more than once.] Damned phantom types! :-P

The native C version, of course, can just tell the type system to go 
take a walk. I don't think you can do that in C#. (Well, short of 
casting everything to Object, and that's just not funny...)

Post a reply to this message

Goto Latest 10 Messages

Next 10 Messages >>>