POV-Ray: Newsgroups: povray.off-topic: Software engineering: Software engineering

POV-Ray : Newsgroups : povray.off-topic : Software engineering : Software engineering		Server Time 4 Jul 2025 05:52:25 EDT (-0400)
From: Invisible
Date: 4 Aug 2011 08:22:51
Message: <4e3a8f1b@news.povray.org>
Imagine the unimaginable: Try to picture the worst possible software 
codebase. The kind of thing that only exists in your most darkest 
nightmares, the ones where you wake up screaming. The sort of code that 
sends people stark raving mad.

You know the kind of thing I'm talking about. Code written by people who 
have no God-damned idea what the hell they're doing. The kind of people 
who just type stuff and then hit it until it works (for sufficiently low 
value of "works").

There are no comments. There is only a deeply-nested tangle of folders 
with randomly-generated names, containing randomly-named files that 
range in size from bytes to megabytes. Each file contains a selection of 
randomly-named constants, global variables and functions.

There is no separation of concerns. No modularity. The horrors include 
the function with 26 lines for converting lower-case letters to 
upper-case, plus another 26 lines to redundantly map the upper-case 
letters to themselves. The 25-line "negate double" function. The 
function that generates "unique" IDs by trivially manipulating existing 
ones, and retrying if a clash occurs.

Now imagine that instead of just /one/ broken upper-case function, there 
are /sixteen copies/ of this thing, all doing upper-case conversion on a 
different subset of the Latin and perhaps Greek alphabets. Twenty five 
different functions which all incorrectly percent-encode URLs in a 
different incorrect manner. Four copies of that weird function that you 
don't know what it does yet, all of them identical.

Best of all, during the development process, the code has changes so 
many times that large parts of the codebase are actually DEAD CODE! Huge 
chunks of it are EVER ACTUALLY RUN. They used to be, but no longer are. 
Good luck figuring out which bits those are.

The programmer didn't read the documentation for the tools and libraries 
he's using. (He probably wouldn't understand it anyway!) As such, the 
software actually /depends on/ all the undocumented, undefined or just 
plain erroneous behaviour of the compiler, the runtime and the libraries.

This thing is a recursive nightmare. Makefiles that run scripts which 
write makefiles that build other scripts that run compilers to compile 
programs that write source code. CPP macros everywhere. Hell, some files 
don't even /compile/ - and the [auto-generated] makefiles actually 
/depend on/ this behaviour in order to function correctly.

The code uses pointer arithmetic to access stuff, and as such depends on 
the exact ordering of stack frames, heap placement, malloc behaviour and 
God only knows what else. Compiled code is modified at runtime. Some of 
the machine opcodes are also data constants; if the compiler ever 
decided to move the code around, it would stop working.



When you've finished screaming in abject terror, let me tell you the 
worst part: This application wasn't just /written/. It was in active 
development for many decades. But apparently Mr Incompetent didn't know 
about version control either. So each time a customer purchases the 
software, they get a copy of the source tree as it exists on the 
development system on that particular day. And every time they have a 
problem, Mr Incompetent drives out to the customer site and fixes /their 
copy of/ the software.

In case you haven't come across this scenario, let me spell out the dire 
consequences of this development pattern: EVERY SINGLE CUSTOMER has a 
completely unique, one-of-a-kind variant of the software. Each one has 
its own completely unique set of bugs. Each one has different functions 
with different names in different files to do the same job in slightly 
different ways.

At customer 5, file 1678981 runs payroll. But at customer 6, that file 
doesn't exist, at customer 7 that file's contents are completely 
different and do something unrelated, and customer 9 has that file but 
it's dead code; on /that/ system, files 57781 and 8756494 contain a 
completely different implementation of the same thing. (Which one gets 
used depends on which screen you access the feature from. No, they 
aren't implemented the same way, nor do they give the same results. 
There's a workaround for that in file 778456 which tries to make the 
results look similar.)



Are you trembling in catatonic hysteria yet? Then let me tell you the 
worst part: The software is /really really popular/. Sure, it doesn't 
actually /work/ very well, but it's a completely mission-critical part 
of every customer's infrastructure, and /must/ be supported. And the 
range of functionality it provides and the complexity of the workarounds 
the customers have invented to deal with it make replacing or rewriting 
it unthinkable.



OK, so that's pretty much the work Daily WTF nightmare imaginable. Of 
course, nobody /actually/ writes software like that, right?

Right??

Well, you know what? There *is* software out there that's like this. 
Almost everything I've just written accurately describes THE HUMAN 
GENOME. Or, indeed, ANY GENOME. "Bashing it until it works" is almost 
/literally/ how evolution by natural selection works.

Now do you understand why even though we've got the entire human genome, 
nobody has figured out how it works yet? :-P

Of course, for a genome the problem is far, far worse. A computer 
program directs a computer to do some computation. A genome directs a 
cell to actually BUILD MORE CELLS. Change the program, and you can 
ACTUALLY CHANGE THE HARDWARE!

In case you think I'm talking theoretically, consider that in most 
organisms, the DNA sequence "UAG" (uracil, adenine, guanine) usually 
means "stop", but in certain bacteria it means "pyrrolysine" instead. 
Pyrrolysine is an amino acid similar to lysine, but not found in most 
organisms. Only a few species create it, and only in these species does 
the UAG code have this special, altered meaning.

http://en.wikipedia.org/wiki/Pyrrolysine

It's almost like writing a computer program that adds a new opcode to 
the processor. (If you know what "microcode" is, this shouldn't sound so 
far-fetched...)

This isn't even the only instance. Humans use an amino acid involving 
selenium via post-transcriptional modification, for example. There's no 
special DNA sequence for it; rather, there's a marker that makes the RNA 
strand tangle up in a specific way, which causes an enzyme to modify the 
existing amino acid by adding the selenium. That's like modifying your 
processor so that the effect of one opcode depends on another opcode 
somewhere else in the executable!

Consider also the thing I said about there being multiple copies of 
every function, some of them no longer in use. Humans have (IIRC) 14 
copies of the haemoglobin gene. 6 of these copies are broken. (Some of 
them have a letter or two that's wrong, while others have huge chunks of 
code completely missing or duplicated.) Of the remainder, several are 
completely identical, some are slightly different but make the same 
chemical (e.g., CUC and CUA both mean leucine), some make different 
chemicals which still function the same way, and there's a couple that 
make different chemicals that actually work differently.

[I don't remember the numbers off hand. It's /something like/ 14 copies, 
but I don't have the book in front of my right now.]

For example, the blood of a fetus contains a slightly different type of 
haemoglobin, using one of the gene copies which is only activated in a 
fetus. In adults, this gene is never switched on. This version of 
haemoglobin is almost exactly the same as the [several] normal 
version[s] that adults have, except it binds to oxygen slightly more 
strongly.

As a direct result of this, when the fetal blood supply comes close to 
the maternal blood supply in the placenta, oxygen migrates from the 
mother to the fetus, AND NOT THE OTHER WAY AROUND. If they both had the 
same affinities, there wouldn't be much of a transfer. That would be fatal.

(So how the hell did it evolve in the first place, if not having it is 
fatal? Well, for a smaller animal, or one with a lower metabolic rate, 
it wouldn't be fatal. Just a disadvantage. It only becomes critical for 
larger animals with higher metabolism.)

As if that wasn't enough, haemoglobin isn't just used for oxygen 
transport! It's also used as an antioxidant, and for temporary iron 
storage within cells. (Remember what I said about separation of 
concerns? Natural selection doesn't do that.)

As another example, consider that the red-sensitive pigment of the eye 
is completely different from the blue one, but the green-sensitive 
pigment is almost identical to the red one, with just a few 
modifications. Clearly the gene got accidentally duplicated, and mutated 
over time. Apparently some animals see 6 colours or more. (Note also 
that the wavelengths of "red" and "green" are way closer to each other 
than for "blue". Three guesses why...)

In a sense, what scientists are trying to do is even /harder/ than 
figuring out what a really buggy program does. They're trying to figure 
out which customer got which copy of what software from whom at what 
point in time, and what modifications were done after that. In other 
words, track Mr Incompetent's movements and figure out the linage of all 
the different copies of the software. Jesus that's hard!
Post a reply to this message