POV-Ray: Newsgroups: povray.off-topic: Type-directed testing

POV-Ray : Newsgroups : povray.off-topic : Type-directed testing		Server Time 28 Jul 2024 18:17:09 EDT (-0400)

From: Orchid Win7 v1
Subject: Type-directed testing
Date: 6 Nov 2013 17:15:45
Message: <527abf91@news.povray.org>

Haskell* has the interesting property that you can often tell quite a 
bit about what a function *does* just by looking at its type signature. 
This has interesting applications for testing Haskell code.

[*] Oh God, there goes half my audience...



Consider, for example, the following C# method:

   public Customer FindCustomer(List<Customer> customers)

Now, looking at this, what it *probably* does is hunt through the 
supplied list argument, and pick out one of the objects somehow. But 
there's no guarantee. In particular:

* It could return null rather than an actual Customer object.
* It could throw an exception and not return anything.
* It could get stuck in an infinite loop and never return.
* The object this method belongs to might have a Customer or two hidden 
inside it.
* It could get a Customer object out of some global variable somewhere. 
(Static variables are basically global variables. Static methods could 
also be involved.)
* It could load a Customer from disk.
* It could connect to a remote database and fetch a Customer from there.
* It could use the reflection API to *dynamically generate* an object 
which conforms to the interface of Customer. (In case you think I'm 
kidding, the Moq library does EXACTLY THIS...)

On top of that, it could be performing arbitrary I/O, communicating with 
other objects in the system, or for that matter altering either the list 
passed in or the objects in it.

Why do we care? Well, it means that we cannot, JUST by looking at the 
type signature, figure out how to test this method. If this method is 
pulling data out of a production database somewhere (or worse, putting 
data INTO one!), then we can't just glance at the type signature and 
then decide to run the method a few times to see what it does...

More concretely, if the method tries to load data off disk, we need to 
know about that, because presumably the test framework needs to generate 
or fake that file somehow. In short, to test this thing, you need to 
know what it's doing.

Now consider the following Haskell function:

   find_customer :: [Customer] -> Customer

This seems like the same thing as the above, but with different syntax. 
In fact, there are some critical differences:

* The function can still throw an exception, and it can still loop 
forever and never return. It CANNOT return null, however. If this 
function ever returns, it will be with a real, genuine Customer value.

* This is a FUNCTION, not a METHOD, so there is no object to worry about.

* The function could fetch a customer from a (public or private) global 
variable somewhere, or run other functions (possibly in other modules). 
However, it is 100% GUARANTEED that every single time you run this 
method with a particular list as input, you will ALWAYS get EXACTLY the 
same output.

* This function CANNOT perform any I/O whatsoever. So no disk access, no 
database access, no sending emails to people, nothing.

* This function CANNOT modify any data, anywhere in the system. (Which 
is why we're guaranteed the same result every time, of course.)

The fact that this can't do I/O means we can safely run it in a test 
framework without knowing what it does. The worst thing that can happen 
is that it gives us the wrong answer. We also do NOT have to configure 
any "other objects" or such; this function can ONLY access the arguments 
we explicitly gave it, and global constants [which obviously cannot 
change - they're constants!]

In short, Haskell guarantees for us that this function has no external 
dependencies that need to be "set up" or "configured" before we can use 
this function.

Now let's try something a little more exotic:

   public T FindTarget<T>(List<T> stuff)

Instead of a list of Customer objects, we now take a generic type 
parameter. This method supposedly works for ANY TYPE! This includes 
non-object types. (C# has a "struct" which has non-reference semantics, 
and generally breaks peoples' minds...)

Now, you might think, the method can only be picking an element out of 
the list at a pre-determined location or something. But you would be 
thinking wrong!

* We can still return null, throw an exception, etc., like before.
* We can still ask other objects what to do, ask somebody over the 
network, etc.
* Every class inherits from Object! So we could, say, look at the hash 
code of the objects and pick the lowest one, or find the one with the 
longest ToString() representation.
* And then there's the reflection API, which basically lets us perform 
arbitrary introspection of each object, even though at compile-time we 
have no idea what the hell the types will be. This basically gives us 
limitless possibilities - including, again, the possibility of MODIFYING 
the objects instead of just returning one of them.
* Again, we can still use reflection to CONSTRUCT a new object out of 
thin air and return that, completely ignoring the input list if we feel 
like it.

So, allowing the method to accept ANY element time limits us ever so 
slightly, but not by much.

Now to Haskell:

   find_target :: [x] -> x

This could throw an exception or get stuck in a loop. But if it does 
return, it is *guaranteed* to return something from the input list. (In 
particular, if the input list is empty, it is *impossible* for this 
function to return an x - because it doesn't have any!)

At this point, we _know_ for a damned _fact_ that the data that comes 
back _must_ be in the input list, and _cannot_ have been altered in any 
way. That's a pretty damned strong set of guarantees. The function 
doesn't know what x will be, so it cannot CREATE an x, and it cannot 
LOOK AT any x you may have given it. All it can really do is move them 
around.

With the function so severely limited in what it can do, testing it is a 
dream. Basically the function doesn't know what x will be, but it DOES 
know that the container will be a list. So it can only really do listy 
things. So we just need to test it with every possible size and shape of 
list input, and we've tested everything it can ever do!



As you might expect, Haskell already has several testing libraries which 
make use of this strong structure. To be sure, if you have a function 
Int -> Int, there's A LOT of things such a function might do. (Although 
it STILL can't do any I/O, and it STILL can't modify any program data.) 
But the more polymorphic a function, the more limited its options.

In my day job, I test C# programs. We do that by writing unit tests, 
where you "fake" all the objects that a particular object talks so, and 
check that given certain inputs, the object does certain things. Trouble 
is, you never know when an object might suddenly decide to actually 
create a *real* disk file, or run a *real* OS command, or whatever. (At 
least, not unless you exhaustively inspect the source code.) And that 
makes testing... interesting.

On stop of that, some of these components need a bazillion other 
components before they can work properly. Sometimes we end up having to 
test half a dozen real objects surrounded by two-dozen fake ones. At 
which point, you're not really unit testing any more; now you're 
integration testing. (Which is also important, but it's a different animal.)

In Haskell land, property-based testing is very popular. For example, if 
you have a function that reverses a list, you might assert that the 
reversed list is always the same length as the original, that every 
element of the original list is also an element of the new list, or that 
the last element of the original list is the first element of the 
reversed list.

For example:

   reverse :: [x] -> [x]

   prop_SameLength :: [x] -> Bool
   prop_SameLength xs = length xs == length (reverse xs)

   prop_MemberBoth :: x -> [x] -> Bool
   prop_MemberBoth x xs = x `elem` xs ==> x `elem` (reverse xs)

Here prop_SameLength is basically a logical statement that says

   For all xs, length xs equals length (reverse xs).

It's implemented as a function that takes xs as an argument, and returns 
a Bool indicating whether the statement is true or false for that 
particular input. You then use a library such as QuickCheck to test this 
property.

What QuickCheck does is to randomly generate 100 lists, call 
prop_SameLength with each of those lists as input, and check that the 
result is always true. If it's ever false, it prints out an error 
message, which includes the input that make the result false. (A common 
example is functions that break given an empty list, or maybe a list 
with exactly one element or something.)

The genius of QuickCheck is that you can hand it *any* function, with 
any number of arguments, and so long as the function returns a Bool, and 
QuickCheck "knows" how to randomly generate each of the argument types, 
it will just TEST the thing, without any further coding from you.

The library is quite sophisticated. For example, prop_MemberBoth takes 
two arguments - an item, and a list. The code says

   For all x and for all xs, IF x is an element of xs THEN x is an 
element of (reverse xs) also.

How QuickCheck implements this is that it will randomly generate x and 
xs, test whether x is an element of xs, and if not THROW THE DATA AWAY 
and randomly generate another test case. Only when the precondition is 
true is the actual test executed.

This is actually a fairly bad example; it would be better to just say "x 
`elem` xs == x `elem` (reverse xs)`. That way we could test that there 
are no ADDITIONAL elements as well. But either way, this is a rather 
poor test; if you randomly choose x and xs, then 99% of the time x won't 
just happen to be a member of xs.

This is the point at which QuickCheck allows you to manually specify 
data generators. For example, QuickCheck can randomly generate integers, 
but for specific tests you might want integers in specific ranges or 
with specific random distributions. QuickCheck also lets you "tag" test 
cases, so you can see how many of the 100 tests tested a particular sort 
of input. (100 is the default test size; you can of course turn this up 
if you wish.)

In a similar vein is SmallCheck. It does exactly the same thing as 
QuickCheck [I *think* it even supports the same property definitions], 
but rather than random test data, it tries to exhaustively generate all 
test inputs up to a certain "size". The theory here being that if your 
list-processing function has a bug, you probably don't need a 
5,000-element list to exhibit the bug. There's probably a 5-element list 
that'll do it. So start with small lists and work up to larger ones. 
(QuickCheck has a nasty habit of randomly choosing the empty list as 
input multiple times per test run, which doesn't tell you anything - 
unless it's one of several arguments...)



Over the last few days, I've been reading about an especially trippy 
property testing framework for Haskell. As some of you might recall, 
data in Haskell exists in a kind of quantum superposition of states; the 
data doesn't "exist" until you "look" at it. Well it turns out that can 
be quite useful!

What the library does is this: Let's say you have a 2-argument function. 
The library executes this function, feeding it to "holes". As soon as 
you "look at" a hole, it instantly throws an exception. However, the 
holes are numbered, and each one throws a *different* exception. In this 
way, the test framework can run the function, catch the exception, and 
tell WHICH ARGUMENT the function tried to look at!

The framework then generates some "real" data for that particular hole, 
and runs the function again. I say "real"; if the data in question is, 
say, a list, then you create a list node who's data-pointer and 
next-pointer both point to yet more holes. In this way, the test 
framework iteratively call the function under test with more and more 
holes filled in, until the function is able to execute successfully.

In short, what this does is generate test data only for stuff that the 
function actually "looks at". For example, the reverse function looks at 
the list structure. It doesn't CARE what's actually IN the list - and 
hence it's pointless running and rerunning the test with different list 
elements. Only the SIZE of the list matters to the reverse function - so 
the test framework just generates various sizes of lists, all full of 
holes. (But the "sum" function, on the other hand, does care what's in 
the list, so the test framework generates different list contents for it...)

So, in a weird kinda way, lazy evaluation and exception handling provide 
a strange kind of "flow-control introspection" at run-time. Neat!



Property testing with QuickCheck is one thing. But there's another tool 
called QuickSpec. It takes a set of functions, and attempts to 
*generate* a set of properties that the functions satisfy. (!!)

What this thing is actually doing is taking your list of functions and 
constructing every possible expression involving these functions. It 
then tries running these expressions with multiple different inputs, and 
partitions them into sets of expressions that produce the same answer 
given the same inputs. It then tries to construct a minimal set of rules 
to describe all these equivalences.

For example, given a module like

   empty :: Set x
   insert :: Set x -> x -> Set x
   delete :: Set x -> x -> Set x
   size :: Set x -> Integer

it might come up with something like

   size empty == 0
   set `insert` x `insert` y == set `insert` y `insert` x
   set `insert` x `delete` x == set

The first line says that the empty set has a size of zero. The next line 
says that inserting X and then inserting Y yields the same result is 
inserting Y and then inserting X. (So, order of insertion doesn't 
matter.) Inserting X and then deleting X leaves the set unchanged.

Notable by its absence is the rule

   size (set `insert` x) = (size set) + 1

Indeed, plugging this into QuickCheck, we find that IT'S NOT ACTUALLY 
TRUE! You can falsify this by inserting a value that's already IN the 
set (which does nothing). We could write

   size (set `insert` x) >= size set

but QuickSpec can't derive inequalities, only equalities. We could also say

   x `not_in` set ==> size (set `insert` x) = (size set) + 1

but, again, QuickSpec isn't clever enough to find preconditions. It 
finds only unconditional properties. (It wouldn't surprise me if finding 
conditional properties is NP-complete...)

The idea seems to be that if your code has a bug, QuickSpec will find 
properties which are obviously absurd, or will fail to find properties 
that you would have expected. I posit that it's probably quite hard to 
recognise this. Quite a lot of "expected" properties actually have an 
obscure edge-case where they don't hold, and so QuickSpec won't report them.

On top of that, QuickSpec is still doing random testing. If may report a 
rule which isn't actually true, but merely appears true for the test 
data that was generated.

One final limitation - which isn't really mentioned much in the paper - 
is that QuickSpec is relying on your implementation of the "==" operator 
to decide when two results are "the same". In particular, if your set 
implements a buggy equals operator, all of your rule results are likely 
to be completely bogus!

Another use for QuickSpec is to generate a bunch of properties, and then 
test those with QuickCheck (or SmallCheck or...) while you refactor the 
code, to make sure its observable behaviour has not changed.



The thing I'm really psyched about right now, though, is a tool called 
Irulan. I'm still reading the PhD thesis, but it appears that what 
Irulan does is take a list of functions, generate all possible 
expressions involving them (as QuickSpec does), and then try to find a 
way of making the code throw an exception. The idea is that if you're 
publishing a library, you want to know that the public interface doesn't 
crash no matter how clients use it.

So you run Irulan, and it gives you a list of all expressions which 
result in a crash. (More exactly, an exception being thrown.) Well, now, 
some of these expressions are *supposed* to throw exceptions - so that's 
OK. But when a human being looks at the list, they can tell what's 
expected error reporting, and what's an unintentional bug in the code.

Irulan uses the nifty lazy test data generation technique I mentioned 
earlier. It also does lots of hoopy stuff with inspecting the results 
generated, to ensure that lazy evaluation isn't delaying the output of 
exceptions that have occurred inside the function. It's also integrated 
with test covering detection and you can make it target certain 
expressions and so forth. It seems really interesting. (Then again, I 
haven't finished reading the thesis yet...)

Post a reply to this message

From: Doctor John
Subject: Re: Type-directed testing
Date: 6 Nov 2013 17:25:08
Message: <527ac1c4$1@news.povray.org>

On 06/11/13 22:15, Orchid Win7 v1 wrote:
> Haskell* has the interesting property that you can often tell quite a
> bit about what a function *does* just by looking at its type signature.
> This has interesting applications for testing Haskell code.
> 
> [*] Oh God, there goes half my audience...
> 

Umm, 90%?

Johm
-- 
Protect the Earth
It was not given to you by your parents
You hold it in trust for your children

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Type-directed testing
Date: 7 Nov 2013 03:14:40
Message: <527b4bf0$1@news.povray.org>

>> [*] Oh God, there goes half my audience...
>
> Umm, 90%?

What's 90% of 2 people?

Post a reply to this message