|
![](/i/fill.gif) |
Just for giggles, what would monads look like in C++?
Well, let's pretend for a minute that C++ has a Function<X, Y> type,
which represents a function that takes an X as input, and produces a Y
as output. (Yes, I realise that isn't the real syntax C++ uses. Just
pretend, for a minute.)
Further, let's pretend that you can create an anonymous function just by
writing
int function(int x) {return 2*x;}
Again, this obviously doesn't actually work in C++, but it makes the
example code much easier to write.
(Actually, C++ is one of only a few languages where you probably /can/
implement monads "for real". But this made-up syntax makes it easier to
quickly explain what's happening.)
Listing #1:
Type1 x = foo();
Type2 y = bar(x);
Type3 z = baz(y);
return z;
This code needs little explanation. You can write the same thing in
Haskell as
Listing #2:
let
x = foo
y = bar x
z = baz y
in z
Now consider this seemingly identical code:
Listing #3:
do
x <- foo
y <- bar x
z <- baz y
return z
It seems like listing #2 and listing #3 should both be identical, right?
Well, no. If I translate listing #3 into C++, you'll see it's
drastically different:
Listing #4:
Box<Type1> box_x = foo();
return Bind(box_x,
Box<Type3> function(Type1 x)
{
Box<Type2> box_y = bar(x);
return Bind(box_y,
Box<Type3> function(Type2 y)
{
Box<Type3> box_z = baz(y);
return Bind(box_z,
Box<Type3> function(Type3 z) {return Return(z);}
);
}
);
}
);
Clearly, this /looks/ totally different. If you can actually /follow/ it
all, you'll see that every statement is now followed by a call to this
Bind() function, taking the rest of the code block as an argument.
Important things to notice:
* The return type of foo() has changed from Type1 to Box<Type1>. This
means that the implementation of foo() has to be changed.
* The first argument to Bind() is a Box<Type1>, but the input to its
second argument is a Type1.
The Box<> class "forms a monad" if and only if it provides the following
functions:
template <typename X>
Box<X> Return(X);
template <typename X, typename Y>
Box<Y> Bind(Box<X>, function<X, Box<Y>>);
(In the interests of good OO design, these should probably be /class
methods/ rather than stand-alone functions. After all, their
implementation is completely different for each class that behaves as a
monad. But I will keep them as normal functions for now...)
Notice how what the Return() function does is NOTHING LIKE what a return
statement usually means. (!) Arguably this is a rather bad naming
choice. [You'll also notice how, in this example, Return() is completely
redundant. Taking it out wouldn't change a thing. I only put it there
for completeness.]
It's not hard to figure out how to implement this:
template <typename X>
class Box
{
private:
X Datum;
public:
Box(X in) {Datum = in;}
X GetDatum() {return Datum;}
};
template <typename X>
Box<X> Return(X in) {return Box<X>(in);}
template <typename X, typename Y>
Box<Y> Bind(Box<X> in, Function<X, Box<Y>> next)
{return next(in.GetDatum());}
(I'm guessing my syntax is wonky - even admitting that we're using
pretend language extensions. It probably also contains at least a dozen
incorrect or at least dubious characteristics. But the gist of it should
be clear.)
This is a monad, but it doesn't "do" anything particularly special. With
Box implemented this way, Listing #4 now does exactly the same thing as
Listing #1, but 8x more complicated. Why bother?
This implementation makes Box the "identity monad". Like the identity
function, the identity monad does... nothing. (And, like the identity
function, this sounds like a very useless thing, but actually isn't.)
So let's try another implementation:
template <typename X>
class Box
{
private:
std::vector<X> Data;
public:
Box(std::vector<X> in) {Data = in;}
std::vector<X> GetData() {return Data;}
};
template <typename X>
Box<X> Return(X in)
{
std::vector<X> out;
out.push_back(in);
return out;
}
template <typename X, typename Y>
Box<Y> Bind(Box<X> box_in, Function<X, Box<Y>> next)
{
std::vector<X> in = box_in.GetData();
std::vector<Y> out, temp;
std::vector<X>::iterator iti;
std::vector<Y>::iterator ito;
for (iti = in.begin(); iti < in.end(); iti++)
{
Box<Y> box_out = next(*it);
temp = box_out.GetData();
for (ito = temp.begin(); ito < temp.end(); ito++)
out.push_back(ito*);
}
return Box<Y>(out);
}
Now Box<X> contains a std::vector<X> instead of just one X value. The
Return() function now generates a 1-element vector. And the Bind()
function now executes next() MULTIPLE TIMES, once for each element in
the input. It then collects all the outputs from each call, and gathers
them into a single giant vector.
(Incidentally, I couldn't figure out a direct way to just append the
entire contents of one vector to another. Obviously there /is/ a way,
and I'm just not looking hard enough...)
Notice how the code for foo(), bar() and baz() HAS NOT CHANGED, and the
code that calls them HAS NOT CHANGED. And yet, now each function can
potentially generate multiple results!
(Of course, for foo() to produce more than one output given a single
input, then either foo() itself or some function it calls would have to
be modified to actually make USE of this feature. But, for example, some
function that foo() calls might change, while foo() itself remains
unaffected, and foo() could still use this new feature of having
multiple results.)
This is what Haskellers call the "list monad", basically. (With the
added twist that Haskell lists are lazy, so it's possible for functions
to produce "infinity results".)
It shouldn't be too hard to see that you could alter [the first] Box
implementation such that it contains a success marker, and that Bind()
only runs the next step in the chain if the previous one succeeded. You
could also modify it to contain an error description in the case of
failure. Then, if bar() decides to return an error-in-a-Box rather than
a value-in-a-Box, baz() gets completely skipped. But baz() ITSELF
doesn't need to do any explicit error checking. Bind() does that for you.
In this way, baz() returning an error-in-a-Box aborts the entire Box
computation. A bit like... THROWING AN EXCEPTION. Except it's a library,
not a language feature, so you can change the design of it at will.
(E.g., make it so you have to throw three exceptions before the code
aborts.) This is what Haskellers call the "failure monad", "error monad"
or "exception monad".
You can also design Box to store various other auxiliary information as
it goes along. You could have a table of data, for example, and provide
commands to replace that table with a new one. This is how Haskell
simulates destructive updates without ACTUALLY doing destructive updates.
(This is the difference between the State monad and the ST monad. The
former allows you to simulate destructive updates, in the way described.
The latter ACTUALLY DOES destructive updates, in the manner of C++ or
Java or whatever.)
It's not hard to see how a Box could contain a string buffer and a
character index, and you could write functions that take a Box and
return a new Box after reading data from the string buffer. This is how
you write parser monads.
I could continue at length, but by now you probably get the picture.
User-defined code invisible inserted between the visible statements you
actually wrote leads to all sorts of interesting possibilities.
Incidentally, since C++ REALLY DOES have a function type (or at least,
function POINTER type) which can be passed as an argument, it's one of
the few languages where it ought to be possible to implement a monad
"for real". I might try it if I get the time.
The problem, of course, is that it's going to be so hellishly verbose
that no sane person would ever, EVER write their C++ code like this...
Saying that, I hear C++ lets you override operator(). So perhaps the
simplest way to describe a monad is as "override operator;". ;-)
Post a reply to this message
|
![](/i/fill.gif) |