|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
We often debate how to make a program run faster. But today I find
myself wondering how to make one slower.
We have a crappy little database application which we bought from
somebody or other. (It's a real commercial product.) As far as I can
tell, it's an address book. It manages company names, addresses,
contacts, etc. Looks like it was designed in the Windows 3 era.
Anyway, possibly the most "special" feature of this lump of junk is that
it keeps this database of contact synchronised *via email*. Once a month
you press a button, and the software gathers up all the new info you've
added since the last sync, puts it into a binary file and emails it to
the DB admin. He then applies the updates to the central DB. Once
everybody's updates have been applied to the central DB, the DB admin
emails another binary file to everybody, containing everybody's changes.
Each user then applies this update to their local copy of the DB, and
now everything is in sync.
Except, obviously, this never actually works. It almost always goes
wrong *somewhere* or other. (My especial favourit is when the software
crashes every single time you try to sync it. The only known way to fix
this is to reinstall Windows and then reinstall the application. This
fixes the problem with 80% success rate.)
The process of initially setting up the software is long and tortuous.
First, install the program and apply the three updates to it. Next, copy
a bunch of company-specific files into special folders and configure the
software to use them. (You must do this with the end user logged in,
since the settings are per-user.) You then must obtain a "blank
database", prepaired by the DB admin. (You can't make it yourself.)
Finally, you have to apply the "initial sync packet". Normally a sync
packet contains only recent changes, but the *initial* packet contains
_everything_ in the central DB. This amounts to about 6 MB of data,
which contains roughly 6,000 entries.
Now here's the incongruous part: It takes roughly FOUR HOURS to process
this initial sync packet.
But... the packet is 6 MB. And when all the processing is done, the
final DB files are about 6 MB. So here's the challenge: Can anybody here
think of an algorithm slow enough that it takes 4 hours to copy a
piffling 6 MB of data? Bare in mind that the entire process is 100%
local. There is no network activity involved. It's all just moving data
between files on the local HD. 4 hours. To move 6 MB.
During these 4 hours, the HD gets absolutely nailed to the wall. So I'm
wondering if it's doing a disk-to-disk insert-sort or something stupid
like that. I honestly can't think of any other algorithm that would be
slow enough to take a ludicrous 4 hours. Think about it: 6,000 records
in 4 hours, that's 2.4 seconds *per record*. HDs may be slower than RAM,
but even a HD can move *a lot* of data around in 2.4 seconds.
Ideas on a postcard.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Sync started 10:19 AM, synch finished 2:02 PM. Total duration: just
short of 4 hours.
6,704 records synced.
Total sync packet size: 6,711,027 bytes.
Total DB size after sync: 60,500,233 bytes.
So I was wrong. Apparently the dataset *does* get larger during the sync
process. But hey, even if the sync packet is compressed or there are
really that many indexes to build... 4 hours??
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 19/04/2010 14:14, Invisible nous fit lire :
> During these 4 hours, the HD gets absolutely nailed to the wall.
Now, that's a good way to kill a fresh new SSD... 10000 average
writes... oups, done on the directory structure! Thanks for leveraging
the whole disk...
It might be building a cross reference/multi dimensional access: great
for quick read access later... terrible in write mode, especially if the
algorithm used is silly (like O(x^N) or O(exp(N))...)
Basic tests with a small number of entries displayed no issue.... it
just does not scale to production the size of your company!
(indexing by name, firstname, addresses, any silly idea... using a
bubble sort on files)
It might also be a very sophisticated paging system, with some entries
per page: when a page get filled, you move all the other pages and
redistribute the entries (using some patricia tree with extended key
length..) of the current page between the old and a new one.... one
entry at a time, with initial packet unsorted... or worse: in
pathological order.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le_Forgeron wrote:
> It might be building a cross reference/multi dimensional access: great
> for quick read access later... terrible in write mode, especially if the
> algorithm used is silly (like O(x^N) or O(exp(N))...)
The only way an index algorithm could be exponential-time is if it's
trying to "sort" the list of entries by exhaustively trying every
possible transposition until it finds the correct one. Even the very
worst sorting algorithms are only N^2 time.
> Basic tests with a small number of entries displayed no issue.... it
> just does not scale to production the size of your company!
6,000 entries is hardly "large". Indeed, if you have much less than
6,000 entries to manage, you barely need special-purpose software to
manage it.
Then again, given that this software appears to just store names and
addresses, you'd think somebody could knock up something in MS Access in
about 20 seconds flat which would do the same job. (Which begs the
question... WHY HAVEN'T THEY?!?!)
> (indexing by name, firstname, addresses, any silly idea... using a
> bubble sort on files)
>
> It might also be a very sophisticated paging system, with some entries
> per page: when a page get filled, you move all the other pages and
> redistribute the entries (using some patricia tree with extended key
> length..) of the current page between the old and a new one.... one
> entry at a time, with initial packet unsorted... or worse: in
> pathological order.
It's entirely possible that sorted order might be the worse possible
case. ;-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> question... WHY HAVEN'T THEY?!?!)
They have. It's called Exchange. It's the killer office app that keeps
everyone from switching over to desktop Linux even in shops where there are
millions of Linux servers.
Why aren't *you* using this?
--
Darren New, San Diego CA, USA (PST)
Linux: Now bringing the quality and usability of
open source desktop apps to your personal electronics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> question... WHY HAVEN'T THEY?!?!)
>
> They have. It's called Exchange. It's the killer office app that keeps
> everyone from switching over to desktop Linux even in shops where there
> are millions of Linux servers.
>
> Why aren't *you* using this?
We are.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> They have. It's called Exchange. It's the killer office app that keeps
> everyone from switching over to desktop Linux even in shops where there
> are millions of Linux servers.
>
> Why aren't *you* using this?
For some reason my company thinks it is better to use a web-based address
book with all sorts of restrictions (seriously, you cannot copy&paste or
print-screen while it is showing!) rather than the purpose-built
Exchange-based address book.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> purpose-built Exchange-based address book.
... that also has a web interface. :-)
I've seen it done better, but usually not. Qualcomm, for example, has the
"ph" program that shows you the person's photo, how to get to their office
from yours, etc.
--
Darren New, San Diego CA, USA (PST)
Linux: Now bringing the quality and usability of
open source desktop apps to your personal electronics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
>>> question... WHY HAVEN'T THEY?!?!)
>>
>> They have. It's called Exchange. It's the killer office app that keeps
>> everyone from switching over to desktop Linux even in shops where
>> there are millions of Linux servers.
>>
>> Why aren't *you* using this?
>
> We are.
All of which still begs the question... If we have Exchange, why am I
still getting asked to use this other piece of crap?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> All of which still begs the question...
No it doesn't. :-)
--
Darren New, San Diego CA, USA (PST)
Linux: Now bringing the quality and usability of
open source desktop apps to your personal electronics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |