|
|
Is there a resource that tells how to write good unit tests? Not "given the
tests, how do you put them in a framework", but "how do you know what tests
to write, given the framework"?
My problem is, in part, that I'm not sure I can think of any non-trivial
unit tests that I'd run with any sort of frequency. Let me illustrate with
some of the sorts of routines I've recently written tests for, or have been
unable to write tests for.
1) Given a string, canonicalize it. Take out punctuation, stop words, and
make it lower case, to increase the likelihood that two song titles or
artists from different sources will match. Of course I wrote tests for this.
And I got it wrong the first time without knowing it, because I forgot to
account for multiple consecutive stop words. But it's a leaf routine, so the
likelihood I change it once I think it works is almost nil, the likelihood
that I could possibly change something else and break this routine is almost
nil (since I'm using a safe language), so I can't imagine why I'd run any
tests I wrote multiple times.
2) I have a table with 6 million songs from iTunes. I have a table with
800,000 ringtones. Both get updated approximately weekly. The metadata for
each is of questionable accuracy. (For example, iTunes has about 4 different
ways of saying "artist", and it's clearly a manual process adding it,
because you can see the same mistakes in different clumps of data. The
ringtone database is in various character sets, but the people building the
database didn't bother to normalize the characters to utf nor record what
character encoding was used. Just as examples.) I also have a continuous
stream of song/artist information coming in from a third company (MG) as
well as from various individual low-speed streams (like individual radio
stations, web screen scrapes, etc). I want to be able to rapidly match the
data from these low-speed streams against the data in the iTunes and
ringtone tables with the best accuracy possible. How do I write a test to
ensure I'm serving the right answers for given songs, considering the data
changes faster than I can make an exhaustive manual test, and that making up
my own data really doesn't tell me anything of interest? Mind, too, that not
only does one company call it "the alan parsons project" and the other call
it "alan parsons", but we also want to find the rendition of Beatle's
"Yesterday" covered by the Philadelphia Philharmonic if and only if the
actual Beatles song isn't out there, for example.
So this is basically the question of "how do you write unit tests for data
sets in the millions where the unobvious errors might be 1% or 0.1% of the
data?"
3) We have a feed of data coming in from MG. Reading the stream is a
destructive operation. The data is vital to our ongoing service, in the
sense that if I consume data while the system is live, people will get the
wrong answer. The only way to read the stream is to use a binary blob of
executable provided to us by MG. How do I test responding to outages? (I.e.,
this is basically "how do you mock something whose specs are unspecified and
which you can't futz with directly?")
4) We output the results to HTML pages. The pages are rather complex, and
include or exclude lots of stuff depending on what kind of result we found,
etc. You could unit-test this by generating known data sets (like "it was a
song from station X, with name and artist Y, with one ringtone perfect
match, another ringtone with the same artist but different song, and two
songs on iTunes with the same name but different artists, one of which sells
better than the other so we want to list it first", and then look at the
HTML that comes out, but every change to the templates is going to be
different HTML coming out. I suppose having the data sets and templates will
let you look at what you think is all the possible HTML coming out
conveniently, but that's also not the sort of thing you need to run if you
haven't changed the templates or tests, and whether the answer is "right" is
not something you can algorithmically check.
It just seems to me that in most everything I do, either I have something
impossible to mock, something unreasonable to check extensively by hand (and
which will yield wrong results in a small percentage of cases), or there's
output from the routines that gets interpreted by something else, so there's
a huge number of equivalent results that are all "correct". (How do you
unit-test automatically that your UI javascript does the right thing in all
the browsers you care about?)
The sophisticated data structures I use all come from someone else. I don't
need to test the on-disk B-Tree implementation the SQL database engine uses,
nor do I need unit tests to check that qsort() is actually returning sorted
lists.
My code tends to work with big piles of messy data, including that where the
result you get isn't always obviously correct. The actual individual methods
themselves are 90% of the time straightforward (as in, call these three
searches and concat the resulting lists), and in the 10% of the time they're
not, I don't change them but once in a blue moon and only to change things
that would break unit tests anyway, such that writing unit tests would be
counter-productive.
--
Darren New, San Diego CA, USA (PST)
Why is there a chainsaw in DOOM?
There aren't any trees on Mars.
Post a reply to this message
|
|