|
![](/i/fill.gif) |
Imagine the unimaginable: Try to picture the worst possible software
codebase. The kind of thing that only exists in your most darkest
nightmares, the ones where you wake up screaming. The sort of code that
sends people stark raving mad.
You know the kind of thing I'm talking about. Code written by people who
have no God-damned idea what the hell they're doing. The kind of people
who just type stuff and then hit it until it works (for sufficiently low
value of "works").
There are no comments. There is only a deeply-nested tangle of folders
with randomly-generated names, containing randomly-named files that
range in size from bytes to megabytes. Each file contains a selection of
randomly-named constants, global variables and functions.
There is no separation of concerns. No modularity. The horrors include
the function with 26 lines for converting lower-case letters to
upper-case, plus another 26 lines to redundantly map the upper-case
letters to themselves. The 25-line "negate double" function. The
function that generates "unique" IDs by trivially manipulating existing
ones, and retrying if a clash occurs.
Now imagine that instead of just /one/ broken upper-case function, there
are /sixteen copies/ of this thing, all doing upper-case conversion on a
different subset of the Latin and perhaps Greek alphabets. Twenty five
different functions which all incorrectly percent-encode URLs in a
different incorrect manner. Four copies of that weird function that you
don't know what it does yet, all of them identical.
Best of all, during the development process, the code has changes so
many times that large parts of the codebase are actually DEAD CODE! Huge
chunks of it are EVER ACTUALLY RUN. They used to be, but no longer are.
Good luck figuring out which bits those are.
The programmer didn't read the documentation for the tools and libraries
he's using. (He probably wouldn't understand it anyway!) As such, the
software actually /depends on/ all the undocumented, undefined or just
plain erroneous behaviour of the compiler, the runtime and the libraries.
This thing is a recursive nightmare. Makefiles that run scripts which
write makefiles that build other scripts that run compilers to compile
programs that write source code. CPP macros everywhere. Hell, some files
don't even /compile/ - and the [auto-generated] makefiles actually
/depend on/ this behaviour in order to function correctly.
The code uses pointer arithmetic to access stuff, and as such depends on
the exact ordering of stack frames, heap placement, malloc behaviour and
God only knows what else. Compiled code is modified at runtime. Some of
the machine opcodes are also data constants; if the compiler ever
decided to move the code around, it would stop working.
When you've finished screaming in abject terror, let me tell you the
worst part: This application wasn't just /written/. It was in active
development for many decades. But apparently Mr Incompetent didn't know
about version control either. So each time a customer purchases the
software, they get a copy of the source tree as it exists on the
development system on that particular day. And every time they have a
problem, Mr Incompetent drives out to the customer site and fixes /their
copy of/ the software.
In case you haven't come across this scenario, let me spell out the dire
consequences of this development pattern: EVERY SINGLE CUSTOMER has a
completely unique, one-of-a-kind variant of the software. Each one has
its own completely unique set of bugs. Each one has different functions
with different names in different files to do the same job in slightly
different ways.
At customer 5, file 1678981 runs payroll. But at customer 6, that file
doesn't exist, at customer 7 that file's contents are completely
different and do something unrelated, and customer 9 has that file but
it's dead code; on /that/ system, files 57781 and 8756494 contain a
completely different implementation of the same thing. (Which one gets
used depends on which screen you access the feature from. No, they
aren't implemented the same way, nor do they give the same results.
There's a workaround for that in file 778456 which tries to make the
results look similar.)
Are you trembling in catatonic hysteria yet? Then let me tell you the
worst part: The software is /really really popular/. Sure, it doesn't
actually /work/ very well, but it's a completely mission-critical part
of every customer's infrastructure, and /must/ be supported. And the
range of functionality it provides and the complexity of the workarounds
the customers have invented to deal with it make replacing or rewriting
it unthinkable.
OK, so that's pretty much the work Daily WTF nightmare imaginable. Of
course, nobody /actually/ writes software like that, right?
Right??
Well, you know what? There *is* software out there that's like this.
Almost everything I've just written accurately describes THE HUMAN
GENOME. Or, indeed, ANY GENOME. "Bashing it until it works" is almost
/literally/ how evolution by natural selection works.
Now do you understand why even though we've got the entire human genome,
nobody has figured out how it works yet? :-P
Of course, for a genome the problem is far, far worse. A computer
program directs a computer to do some computation. A genome directs a
cell to actually BUILD MORE CELLS. Change the program, and you can
ACTUALLY CHANGE THE HARDWARE!
In case you think I'm talking theoretically, consider that in most
organisms, the DNA sequence "UAG" (uracil, adenine, guanine) usually
means "stop", but in certain bacteria it means "pyrrolysine" instead.
Pyrrolysine is an amino acid similar to lysine, but not found in most
organisms. Only a few species create it, and only in these species does
the UAG code have this special, altered meaning.
http://en.wikipedia.org/wiki/Pyrrolysine
It's almost like writing a computer program that adds a new opcode to
the processor. (If you know what "microcode" is, this shouldn't sound so
far-fetched...)
This isn't even the only instance. Humans use an amino acid involving
selenium via post-transcriptional modification, for example. There's no
special DNA sequence for it; rather, there's a marker that makes the RNA
strand tangle up in a specific way, which causes an enzyme to modify the
existing amino acid by adding the selenium. That's like modifying your
processor so that the effect of one opcode depends on another opcode
somewhere else in the executable!
Consider also the thing I said about there being multiple copies of
every function, some of them no longer in use. Humans have (IIRC) 14
copies of the haemoglobin gene. 6 of these copies are broken. (Some of
them have a letter or two that's wrong, while others have huge chunks of
code completely missing or duplicated.) Of the remainder, several are
completely identical, some are slightly different but make the same
chemical (e.g., CUC and CUA both mean leucine), some make different
chemicals which still function the same way, and there's a couple that
make different chemicals that actually work differently.
[I don't remember the numbers off hand. It's /something like/ 14 copies,
but I don't have the book in front of my right now.]
For example, the blood of a fetus contains a slightly different type of
haemoglobin, using one of the gene copies which is only activated in a
fetus. In adults, this gene is never switched on. This version of
haemoglobin is almost exactly the same as the [several] normal
version[s] that adults have, except it binds to oxygen slightly more
strongly.
As a direct result of this, when the fetal blood supply comes close to
the maternal blood supply in the placenta, oxygen migrates from the
mother to the fetus, AND NOT THE OTHER WAY AROUND. If they both had the
same affinities, there wouldn't be much of a transfer. That would be fatal.
(So how the hell did it evolve in the first place, if not having it is
fatal? Well, for a smaller animal, or one with a lower metabolic rate,
it wouldn't be fatal. Just a disadvantage. It only becomes critical for
larger animals with higher metabolism.)
As if that wasn't enough, haemoglobin isn't just used for oxygen
transport! It's also used as an antioxidant, and for temporary iron
storage within cells. (Remember what I said about separation of
concerns? Natural selection doesn't do that.)
As another example, consider that the red-sensitive pigment of the eye
is completely different from the blue one, but the green-sensitive
pigment is almost identical to the red one, with just a few
modifications. Clearly the gene got accidentally duplicated, and mutated
over time. Apparently some animals see 6 colours or more. (Note also
that the wavelengths of "red" and "green" are way closer to each other
than for "blue". Three guesses why...)
In a sense, what scientists are trying to do is even /harder/ than
figuring out what a really buggy program does. They're trying to figure
out which customer got which copy of what software from whom at what
point in time, and what modifications were done after that. In other
words, track Mr Incompetent's movements and figure out the linage of all
the different copies of the software. Jesus that's hard!
Post a reply to this message
|
![](/i/fill.gif) |