|
 |
We often debate how to make a program run faster. But today I find
myself wondering how to make one slower.
We have a crappy little database application which we bought from
somebody or other. (It's a real commercial product.) As far as I can
tell, it's an address book. It manages company names, addresses,
contacts, etc. Looks like it was designed in the Windows 3 era.
Anyway, possibly the most "special" feature of this lump of junk is that
it keeps this database of contact synchronised *via email*. Once a month
you press a button, and the software gathers up all the new info you've
added since the last sync, puts it into a binary file and emails it to
the DB admin. He then applies the updates to the central DB. Once
everybody's updates have been applied to the central DB, the DB admin
emails another binary file to everybody, containing everybody's changes.
Each user then applies this update to their local copy of the DB, and
now everything is in sync.
Except, obviously, this never actually works. It almost always goes
wrong *somewhere* or other. (My especial favourit is when the software
crashes every single time you try to sync it. The only known way to fix
this is to reinstall Windows and then reinstall the application. This
fixes the problem with 80% success rate.)
The process of initially setting up the software is long and tortuous.
First, install the program and apply the three updates to it. Next, copy
a bunch of company-specific files into special folders and configure the
software to use them. (You must do this with the end user logged in,
since the settings are per-user.) You then must obtain a "blank
database", prepaired by the DB admin. (You can't make it yourself.)
Finally, you have to apply the "initial sync packet". Normally a sync
packet contains only recent changes, but the *initial* packet contains
_everything_ in the central DB. This amounts to about 6 MB of data,
which contains roughly 6,000 entries.
Now here's the incongruous part: It takes roughly FOUR HOURS to process
this initial sync packet.
But... the packet is 6 MB. And when all the processing is done, the
final DB files are about 6 MB. So here's the challenge: Can anybody here
think of an algorithm slow enough that it takes 4 hours to copy a
piffling 6 MB of data? Bare in mind that the entire process is 100%
local. There is no network activity involved. It's all just moving data
between files on the local HD. 4 hours. To move 6 MB.
During these 4 hours, the HD gets absolutely nailed to the wall. So I'm
wondering if it's doing a disk-to-disk insert-sort or something stupid
like that. I honestly can't think of any other algorithm that would be
slow enough to take a ludicrous 4 hours. Think about it: 6,000 records
in 4 hours, that's 2.4 seconds *per record*. HDs may be slower than RAM,
but even a HD can move *a lot* of data around in 2.4 seconds.
Ideas on a postcard.
Post a reply to this message
|
 |