|
 |
OK, so I know nobody actually cares or anything, but I just "finished"
work on a program that takes a Haskell expression, plus the type
signatures for any free variables mentioned, and figures out what the
type of the expression and all its subexpressions is.
Sounds pretty easy, right? Well let me tell you, it's taken me the
entire week to get it to work properly!
The final program is not especially large or complex, but *getting* to
this point has been a long hard journey. Taking a nontrivial program and
rearranging it while you try out different designs is quite a lot of
work. If I had known in my head exactly how I was going to code this
before I started typing, I could have finished it significantly sooner!
The first major task is to write an expression parser. This is
*drastically* less trivial than it sounds! (Currently my parser
understands only a fraction of Haskell's full syntax. The bare minimum
to parse useful expressions.) And the expression trees are so verbose
that you really *need* a pretty-printer too. Ideally with propper
bracket handling. (You have *no idea* how damned long it took to get the
brackets to work correctly...)
Next, you quickly find that various program passes manfunction if
variable names are not unique. Like, if somebody writes "(\x -> x) (\x
-> x + 1)", those are two different, unrelated local variables which
both just happen to be called "x". So you need to somehow make all the
variable names unique, while correctly respecting the language scoping
rules. This is surprisingly easy to do wrong. (And maddeningly tricky to
track down once you notice it's wrong - if you notice at all!)
Then you're going to need a type unification engine. The basic algorithm
is simple, but there are a large number of minor practical details which
all have to be implemented correctly or you get gibberish. Good luck
testing it! Debugging is even more fun...
Next you need to take the expression and decompose it into a set of
subexpressions and note down their interrelationships. From these, you
can compute a set of type relations which the unification engine then
applies. If unification succeeds, you get a list of type signatures for
every subexpression. If unification fails, the expression is ill-typed.
Or your type checker is buggy!
Ah, but wait! The exact way you decompose the expression and apply the
unification process dictate what result you get. In the process of
crawling through this stuff, I discovered things I never even realised
about Haskell's type system. For example...
(foo 1, foo 'x')
If "foo" is a top-level variable, this expression might be well-typed.
("Might" depending on what type signature foo has, obviously.) However,
if "foo" is a local variable, then the above expression is *guaranteed*
to be ill-typed.
Stop and think about that for a moment: Where a variable is bound
determins whether this expression is well-typed or not. That should tell
you that something pretty subtle is going on here!
I never realised this, but look:
\ foo -> (foo 1, foo 'x')
There is no type which can be assigned to "foo" which will make this
expression acceptable. However, consider this:
foo :: x -> x
bar = (foo 1, foo 'x')
This *is* well-typed. It seems that for each invocation of "foo", the
"x" type variable is allowed to take on a different value. However, in
\ foo -> (foo 1)
the expression's type is going to be "(Int -> y) -> y". Basically here
"foo" is required to have one specific type, and it must be the same
each time it's used.
There's actually a non-standard extension that allows you to write
(\ foo -> (foo 1, foo 'x')) :: forall x. (x -> y) -> (y, y)
The above will now type-check. I think what the magical "forall" keyword
is actually doing is generating a new type variable inside the type
checker each time "foo" is mentioned...
Well, there we are. Now I'm off to go implement the rest of my
program... (Such as, say, making it understand more syntax!)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |