|
|
Haskell* has the interesting property that you can often tell quite a
bit about what a function *does* just by looking at its type signature.
This has interesting applications for testing Haskell code.
[*] Oh God, there goes half my audience...
Consider, for example, the following C# method:
public Customer FindCustomer(List<Customer> customers)
Now, looking at this, what it *probably* does is hunt through the
supplied list argument, and pick out one of the objects somehow. But
there's no guarantee. In particular:
* It could return null rather than an actual Customer object.
* It could throw an exception and not return anything.
* It could get stuck in an infinite loop and never return.
* The object this method belongs to might have a Customer or two hidden
inside it.
* It could get a Customer object out of some global variable somewhere.
(Static variables are basically global variables. Static methods could
also be involved.)
* It could load a Customer from disk.
* It could connect to a remote database and fetch a Customer from there.
* It could use the reflection API to *dynamically generate* an object
which conforms to the interface of Customer. (In case you think I'm
kidding, the Moq library does EXACTLY THIS...)
On top of that, it could be performing arbitrary I/O, communicating with
other objects in the system, or for that matter altering either the list
passed in or the objects in it.
Why do we care? Well, it means that we cannot, JUST by looking at the
type signature, figure out how to test this method. If this method is
pulling data out of a production database somewhere (or worse, putting
data INTO one!), then we can't just glance at the type signature and
then decide to run the method a few times to see what it does...
More concretely, if the method tries to load data off disk, we need to
know about that, because presumably the test framework needs to generate
or fake that file somehow. In short, to test this thing, you need to
know what it's doing.
Now consider the following Haskell function:
find_customer :: [Customer] -> Customer
This seems like the same thing as the above, but with different syntax.
In fact, there are some critical differences:
* The function can still throw an exception, and it can still loop
forever and never return. It CANNOT return null, however. If this
function ever returns, it will be with a real, genuine Customer value.
* This is a FUNCTION, not a METHOD, so there is no object to worry about.
* The function could fetch a customer from a (public or private) global
variable somewhere, or run other functions (possibly in other modules).
However, it is 100% GUARANTEED that every single time you run this
method with a particular list as input, you will ALWAYS get EXACTLY the
same output.
* This function CANNOT perform any I/O whatsoever. So no disk access, no
database access, no sending emails to people, nothing.
* This function CANNOT modify any data, anywhere in the system. (Which
is why we're guaranteed the same result every time, of course.)
The fact that this can't do I/O means we can safely run it in a test
framework without knowing what it does. The worst thing that can happen
is that it gives us the wrong answer. We also do NOT have to configure
any "other objects" or such; this function can ONLY access the arguments
we explicitly gave it, and global constants [which obviously cannot
change - they're constants!]
In short, Haskell guarantees for us that this function has no external
dependencies that need to be "set up" or "configured" before we can use
this function.
Now let's try something a little more exotic:
public T FindTarget<T>(List<T> stuff)
Instead of a list of Customer objects, we now take a generic type
parameter. This method supposedly works for ANY TYPE! This includes
non-object types. (C# has a "struct" which has non-reference semantics,
and generally breaks peoples' minds...)
Now, you might think, the method can only be picking an element out of
the list at a pre-determined location or something. But you would be
thinking wrong!
* We can still return null, throw an exception, etc., like before.
* We can still ask other objects what to do, ask somebody over the
network, etc.
* Every class inherits from Object! So we could, say, look at the hash
code of the objects and pick the lowest one, or find the one with the
longest ToString() representation.
* And then there's the reflection API, which basically lets us perform
arbitrary introspection of each object, even though at compile-time we
have no idea what the hell the types will be. This basically gives us
limitless possibilities - including, again, the possibility of MODIFYING
the objects instead of just returning one of them.
* Again, we can still use reflection to CONSTRUCT a new object out of
thin air and return that, completely ignoring the input list if we feel
like it.
So, allowing the method to accept ANY element time limits us ever so
slightly, but not by much.
Now to Haskell:
find_target :: [x] -> x
This could throw an exception or get stuck in a loop. But if it does
return, it is *guaranteed* to return something from the input list. (In
particular, if the input list is empty, it is *impossible* for this
function to return an x - because it doesn't have any!)
At this point, we _know_ for a damned _fact_ that the data that comes
back _must_ be in the input list, and _cannot_ have been altered in any
way. That's a pretty damned strong set of guarantees. The function
doesn't know what x will be, so it cannot CREATE an x, and it cannot
LOOK AT any x you may have given it. All it can really do is move them
around.
With the function so severely limited in what it can do, testing it is a
dream. Basically the function doesn't know what x will be, but it DOES
know that the container will be a list. So it can only really do listy
things. So we just need to test it with every possible size and shape of
list input, and we've tested everything it can ever do!
As you might expect, Haskell already has several testing libraries which
make use of this strong structure. To be sure, if you have a function
Int -> Int, there's A LOT of things such a function might do. (Although
it STILL can't do any I/O, and it STILL can't modify any program data.)
But the more polymorphic a function, the more limited its options.
In my day job, I test C# programs. We do that by writing unit tests,
where you "fake" all the objects that a particular object talks so, and
check that given certain inputs, the object does certain things. Trouble
is, you never know when an object might suddenly decide to actually
create a *real* disk file, or run a *real* OS command, or whatever. (At
least, not unless you exhaustively inspect the source code.) And that
makes testing... interesting.
On stop of that, some of these components need a bazillion other
components before they can work properly. Sometimes we end up having to
test half a dozen real objects surrounded by two-dozen fake ones. At
which point, you're not really unit testing any more; now you're
integration testing. (Which is also important, but it's a different animal.)
In Haskell land, property-based testing is very popular. For example, if
you have a function that reverses a list, you might assert that the
reversed list is always the same length as the original, that every
element of the original list is also an element of the new list, or that
the last element of the original list is the first element of the
reversed list.
For example:
reverse :: [x] -> [x]
prop_SameLength :: [x] -> Bool
prop_SameLength xs = length xs == length (reverse xs)
prop_MemberBoth :: x -> [x] -> Bool
prop_MemberBoth x xs = x `elem` xs ==> x `elem` (reverse xs)
Here prop_SameLength is basically a logical statement that says
For all xs, length xs equals length (reverse xs).
It's implemented as a function that takes xs as an argument, and returns
a Bool indicating whether the statement is true or false for that
particular input. You then use a library such as QuickCheck to test this
property.
What QuickCheck does is to randomly generate 100 lists, call
prop_SameLength with each of those lists as input, and check that the
result is always true. If it's ever false, it prints out an error
message, which includes the input that make the result false. (A common
example is functions that break given an empty list, or maybe a list
with exactly one element or something.)
The genius of QuickCheck is that you can hand it *any* function, with
any number of arguments, and so long as the function returns a Bool, and
QuickCheck "knows" how to randomly generate each of the argument types,
it will just TEST the thing, without any further coding from you.
The library is quite sophisticated. For example, prop_MemberBoth takes
two arguments - an item, and a list. The code says
For all x and for all xs, IF x is an element of xs THEN x is an
element of (reverse xs) also.
How QuickCheck implements this is that it will randomly generate x and
xs, test whether x is an element of xs, and if not THROW THE DATA AWAY
and randomly generate another test case. Only when the precondition is
true is the actual test executed.
This is actually a fairly bad example; it would be better to just say "x
`elem` xs == x `elem` (reverse xs)`. That way we could test that there
are no ADDITIONAL elements as well. But either way, this is a rather
poor test; if you randomly choose x and xs, then 99% of the time x won't
just happen to be a member of xs.
This is the point at which QuickCheck allows you to manually specify
data generators. For example, QuickCheck can randomly generate integers,
but for specific tests you might want integers in specific ranges or
with specific random distributions. QuickCheck also lets you "tag" test
cases, so you can see how many of the 100 tests tested a particular sort
of input. (100 is the default test size; you can of course turn this up
if you wish.)
In a similar vein is SmallCheck. It does exactly the same thing as
QuickCheck [I *think* it even supports the same property definitions],
but rather than random test data, it tries to exhaustively generate all
test inputs up to a certain "size". The theory here being that if your
list-processing function has a bug, you probably don't need a
5,000-element list to exhibit the bug. There's probably a 5-element list
that'll do it. So start with small lists and work up to larger ones.
(QuickCheck has a nasty habit of randomly choosing the empty list as
input multiple times per test run, which doesn't tell you anything -
unless it's one of several arguments...)
Over the last few days, I've been reading about an especially trippy
property testing framework for Haskell. As some of you might recall,
data in Haskell exists in a kind of quantum superposition of states; the
data doesn't "exist" until you "look" at it. Well it turns out that can
be quite useful!
What the library does is this: Let's say you have a 2-argument function.
The library executes this function, feeding it to "holes". As soon as
you "look at" a hole, it instantly throws an exception. However, the
holes are numbered, and each one throws a *different* exception. In this
way, the test framework can run the function, catch the exception, and
tell WHICH ARGUMENT the function tried to look at!
The framework then generates some "real" data for that particular hole,
and runs the function again. I say "real"; if the data in question is,
say, a list, then you create a list node who's data-pointer and
next-pointer both point to yet more holes. In this way, the test
framework iteratively call the function under test with more and more
holes filled in, until the function is able to execute successfully.
In short, what this does is generate test data only for stuff that the
function actually "looks at". For example, the reverse function looks at
the list structure. It doesn't CARE what's actually IN the list - and
hence it's pointless running and rerunning the test with different list
elements. Only the SIZE of the list matters to the reverse function - so
the test framework just generates various sizes of lists, all full of
holes. (But the "sum" function, on the other hand, does care what's in
the list, so the test framework generates different list contents for it...)
So, in a weird kinda way, lazy evaluation and exception handling provide
a strange kind of "flow-control introspection" at run-time. Neat!
Property testing with QuickCheck is one thing. But there's another tool
called QuickSpec. It takes a set of functions, and attempts to
*generate* a set of properties that the functions satisfy. (!!)
What this thing is actually doing is taking your list of functions and
constructing every possible expression involving these functions. It
then tries running these expressions with multiple different inputs, and
partitions them into sets of expressions that produce the same answer
given the same inputs. It then tries to construct a minimal set of rules
to describe all these equivalences.
For example, given a module like
empty :: Set x
insert :: Set x -> x -> Set x
delete :: Set x -> x -> Set x
size :: Set x -> Integer
it might come up with something like
size empty == 0
set `insert` x `insert` y == set `insert` y `insert` x
set `insert` x `delete` x == set
The first line says that the empty set has a size of zero. The next line
says that inserting X and then inserting Y yields the same result is
inserting Y and then inserting X. (So, order of insertion doesn't
matter.) Inserting X and then deleting X leaves the set unchanged.
Notable by its absence is the rule
size (set `insert` x) = (size set) + 1
Indeed, plugging this into QuickCheck, we find that IT'S NOT ACTUALLY
TRUE! You can falsify this by inserting a value that's already IN the
set (which does nothing). We could write
size (set `insert` x) >= size set
but QuickSpec can't derive inequalities, only equalities. We could also say
x `not_in` set ==> size (set `insert` x) = (size set) + 1
but, again, QuickSpec isn't clever enough to find preconditions. It
finds only unconditional properties. (It wouldn't surprise me if finding
conditional properties is NP-complete...)
The idea seems to be that if your code has a bug, QuickSpec will find
properties which are obviously absurd, or will fail to find properties
that you would have expected. I posit that it's probably quite hard to
recognise this. Quite a lot of "expected" properties actually have an
obscure edge-case where they don't hold, and so QuickSpec won't report them.
On top of that, QuickSpec is still doing random testing. If may report a
rule which isn't actually true, but merely appears true for the test
data that was generated.
One final limitation - which isn't really mentioned much in the paper -
is that QuickSpec is relying on your implementation of the "==" operator
to decide when two results are "the same". In particular, if your set
implements a buggy equals operator, all of your rule results are likely
to be completely bogus!
Another use for QuickSpec is to generate a bunch of properties, and then
test those with QuickCheck (or SmallCheck or...) while you refactor the
code, to make sure its observable behaviour has not changed.
The thing I'm really psyched about right now, though, is a tool called
Irulan. I'm still reading the PhD thesis, but it appears that what
Irulan does is take a list of functions, generate all possible
expressions involving them (as QuickSpec does), and then try to find a
way of making the code throw an exception. The idea is that if you're
publishing a library, you want to know that the public interface doesn't
crash no matter how clients use it.
So you run Irulan, and it gives you a list of all expressions which
result in a crash. (More exactly, an exception being thrown.) Well, now,
some of these expressions are *supposed* to throw exceptions - so that's
OK. But when a human being looks at the list, they can tell what's
expected error reporting, and what's an unintentional bug in the code.
Irulan uses the nifty lazy test data generation technique I mentioned
earlier. It also does lots of hoopy stuff with inspecting the results
generated, to ensure that lazy evaluation isn't delaying the output of
exceptions that have occurred inside the function. It's also integrated
with test covering detection and you can make it target certain
expressions and so forth. It seems really interesting. (Then again, I
haven't finished reading the thesis yet...)
Post a reply to this message
|
|