POV-Ray : Newsgroups : povray.off-topic : Funniest bug ever Server Time
31 Oct 2024 19:29:59 EDT (-0400)
  Funniest bug ever (Message 1 to 10 of 49)  
Goto Latest 10 Messages Next 10 Messages >>>
From: Orchid Win7 v1
Subject: Funniest bug ever
Date: 23 Feb 2013 06:34:07
Message: <5128a92f$1@news.povray.org>
OK, so as you guys know, I now write computer software for a living. The 
installation DVD for this thing used to be hand-crafted, but thanks to 
[mostly] my efforts it's now auto-generated by a script. Anyway, 
obviously if you completely replace the installation system, you need to 
go test it. So we're testing it.

So yesterday the tester is trying out the software on the various 
platforms we support it on. And he starts complaining that on one 
specific model of laptop, it's not working properly. He claims the 
installer gets to 60% of the first step, and then skips the rest of that 
step and skips the next step completely and just says "installation 
successful".

So I burn a copy of the same DVD and try installing it on another laptop 
of the same model. It works perfectly. But the tester is insistent; he 
hands me the laptop with the DVD still in the drive. I run the 
installer, and it appears to work fine. But then, around about 65%, it 
suddenly "completes", just like the guy said. That's odd.

The install disk is a live Linux environment that runs a Bash script to 
do the installation. It does some crazy multi-way piping and redirecting 
to get the progress display to work. (Most Linux commands helpfully 
provide absolutely no feedback whatsoever, which isn't very good for an 
operation that takes 10 minutes to complete...) With all this piping 
going on, it wouldn't surprise me if some command somewhere is emitting 
an error message and it's simply been "lost" somewhere. A bit alarming 
that the script still claims that "installation was successful" though...

At this point I'm wondering if maybe the disk has a scratch on it which 
prevents it reading past 65% of the image file. So I boot the disk and 
do an md5sum of both image files. They're both fine. Hmm. I manually run 
the key command that actually does the installation. Initially I'm 
getting 150 MB/sec - which is odd, given that the internal SSD device 
maxes out at about 40 MB/sec. After a few minutes, the speed drops to 
about 10 MB/sec - a more usual number. And then the command simply 
/stops/, with no indication as to why. It claims to have completed, but 
the amount of data copied is clearly too low.

So I take the DVD out and put in the one I've been testing with. I 
notice the DVD I took out has a slightly older version of the image 
file. But the installation script is identical, so that shouldn't matter 
at this stage.

I run the installer again, and again it fails the same way. I run it 
manually, and again it fails the same way. Now I'm wondering if I've 
somehow burned out that sector on the SSD or something. (But surely 
wear-levelling would... hmm, anyway.) So I look inside the harddrive bay 
to see how old the drive is...



At this point, things get WEIRD! I look in the drive bay, and see... 
daylight. It turns out there's NOTHING IN THERE. There *is* no harddrive!

This is not /that/ unusual; we do occasionally swap drives around or 
take drives out of machines. So somehow our test guy ended up with a 
laptop with no drive. That's not especially surprising.

But... so... if there's no drive, WHAT THE HELL IS THE INSTALLER 
INSTALLING TO?!? O_O

Why does it sit there for 5 minutes apparently working perfectly when 
there's NO DRIVE PRESENT?!

My collegue suggested that maybe someone had left an SD card in the 
internal card reader. But no. Man, that would have been fun though!

So I open up the Bash script and start reading. This is one of the few 
parts that I didn't write. (It was written early on, when I had hardly 
learned how to do Bash, so my boss else wrote it.) The programmers among 
you will appreciate this: The script reports an error if the number of 
devices connected is GREATER THAN ONE. But the script neglects to check 
for the possibility that the number of devices is LESS THAN ONE. (!) 
Because, hey, who the hell has a laptop with no harddrive in it?

OK, well that's fine I guess, but next the script checks that the 
[assumed to be] one device has sufficient space. Surely a non-existent 
device has zero space on it, right? RIGHT?? How in the HELL is this test 
passing?

Ah, but wait. You're thinking about this as if you're using a REAL 
programming language. Bash is just text-munging. It doesn't compare the 
contents of a variable to a number. It compares two bits of text to each 
other. And one of those pieces of text is now "error: the device 
/dev/sda does not exist", which is more characters than "80000000000", 
and hence is reported as being "greater than" the required disk size. 
FACEPALM!

So there's no error produced by the script. But how the HELL does it 
image a device that DOESN'T SODDING EXIST?!

Ah, but wait. This is Linux, remember? Consider the following command:

   cat Image1.raw.gz | gzip -d | dd of=/dev/sda

Slowly it dawned on my what is happening here. If there *is* a block 
device connected, then UDev will generate a special file named /dev/sda 
which represents this device, and the above command will overwrite the 
contents of that device. HOWEVER... if there is *no* block device 
connected, then this file will not exist. In that case, rather than 
produce some kind of error, the above command will *create* a regular 
file named /dev/sda and try to decompress 20GB of data into it. (!)

Once upon a time, that might not have worked. A DVD is a read-only 
device, after all. But in this modern age, you have the DVD, which is 
read-only, and you then overlay a read/write filesystem backed by RAM.

In summary, the installer is decompressing a 20GB disk image into RAM, 
hence the 150MB/sec transfer speed. (Probably limited by DVD drive speed 
and the rate at which the CPU can decompress the data.) After about 5GB, 
free RAM has become so fragmented that the transfer rate drops to 
10MB/sec as the OS desperately searches for empty pages. And eventually, 
once ALL AVAILABLE RAM HAS BEEN EXHAUSTED, the process is summarily 
terminated.

If this was written in a real programming language, some sort of 
exception would have been thrown, which would have alerted me to the 
problem (and prevented the "installation successful" message being 
shown). But this is Bash. By default, it completely ignores all errors, 
problems and malfunctions, and continues executing the next command as 
if everything worked perfectly. So when DD gets terminated, Bash simply 
executes the next line of the script - which says "installation successful"!

I told my boss, and he spent literally 15 minutes laughing 
uncontrollably. There's always a lot of banter in our office, but it's 
unusual for somebody to actually find something so funny that they're 
actually unable to speak any more. And for 15 minutes?

Damn, this is probably the most amusing bug I've ever seen.

Fortunately, the fix is very simple. You just need to add a check for 
the possibility of there being zero devices. But I love the way that 
there are three separate stages where the problem *should* have been 
caught, but wasn't. And all because we wrote the thing in Bash...

Also, somebody give that tester a medal. There's no way in hell we would 
have thought to actually *test* for such an obscure condition. (Not that 
the tester did so intentionally... It was simply a lucky accident. But 
you *know* some customer somewhere is going to do this one day, and 
saying something succeeded when it didn't is a pretty serious bug!)


Post a reply to this message

From: clipka
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 08:01:33
Message: <5128bdad$1@news.povray.org>
Am 23.02.2013 12:34, schrieb Orchid Win7 v1:

> Also, somebody give that tester a medal. There's no way in hell we would
> have thought to actually *test* for such an obscure condition. (Not that
> the tester did so intentionally... It was simply a lucky accident. But
> you *know* some customer somewhere is going to do this one day, and
> saying something succeeded when it didn't is a pretty serious bug!)

Heh - I once had a work colleague like that: Be it karma, a genetic 
disposition, or - as he used to put it - having "shit on his fingers" - 
he had a supernatural talent for having things go wrong on him. He 
actually took quite some pride in this mysterious gift of his, because 
yes, of course, he did work as a tester. I swear this guy was /born/ to 
test the holy crap out of things.


Post a reply to this message

From: Stephen
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 08:40:46
Message: <5128c6de@news.povray.org>
On 23/02/2013 1:01 PM, clipka wrote:
> Heh - I once had a work colleague like that: Be it karma, a genetic
> disposition, or - as he used to put it - having "shit on his fingers" -
> he had a supernatural talent for having things go wrong on him. He
> actually took quite some pride in this mysterious gift of his, because
> yes, of course, he did work as a tester. I swear this guy was /born/ to
> test the holy crap out of things.

I can do that. If anything can be broken, I can break it. ;-)

-- 
Regards
     Stephen


Post a reply to this message

From: clipka
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 08:47:59
Message: <5128c88f$1@news.povray.org>
Am 23.02.2013 14:40, schrieb Stephen:
> On 23/02/2013 1:01 PM, clipka wrote:
>> Heh - I once had a work colleague like that: Be it karma, a genetic
>> disposition, or - as he used to put it - having "shit on his fingers" -
>> he had a supernatural talent for having things go wrong on him. He
>> actually took quite some pride in this mysterious gift of his, because
>> yes, of course, he did work as a tester. I swear this guy was /born/ to
>> test the holy crap out of things.
>
> I can do that. If anything can be broken, I can break it. ;-)

Well, he didn't have to break things. Things broke by themselves at his 
merest presence ;-)


Post a reply to this message

From: Kenneth
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 13:00:00
Message: <web.512903399d37e2efc2d977c20@news.povray.org>
Orchid Win7 v1 <voi### [at] devnull> wrote:

>
> At this point, things get WEIRD! I look in the drive bay, and see...
> daylight. It turns out there's NOTHING IN THERE. There *is* no harddrive!
>

That is too funny! The best laugh of the day.


Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 14:07:09
Message: <5129135d$1@news.povray.org>
>> At this point, things get WEIRD! I look in the drive bay, and see...
>> daylight. It turns out there's NOTHING IN THERE. There *is* no harddrive!
>
> That is too funny! The best laugh of the day.

Like I said, it had us all rolling around on the floor...


Post a reply to this message

From: nemesis
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 15:00:01
Message: <web.51291efa9d37e2efbcbf52970@news.povray.org>
fun indeed

I mean, to put a windoze guy on charge of Linux stuff :)


Post a reply to this message

From: Francois Labreque
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 17:24:23
Message: <51294197$1@news.povray.org>

> The install disk is a live Linux environment that runs a Bash script to
> do the installation. It does some crazy multi-way piping and redirecting
> to get the progress display to work. (Most Linux commands helpfully
> provide absolutely no feedback whatsoever, which isn't very good for an
> operation that takes 10 minutes to complete...) With all this piping
> going on, it wouldn't surprise me if some command somewhere is emitting
> an error message and it's simply been "lost" somewhere. A bit alarming
> that the script still claims that "installation was successful" though...

1. Redirect stderr to a different file.  And check for that file being 
more than 0 bytes before claiming the install completed succesfully.

2. Expect the unexpected.  I am currently having issues with a "serious" 
IT company over an install script that does not error checking and where 
there are multiple error-prone steps between zeroing out the old config 
and recreating a new one, which can leave the machine pretty much 
brain-dead if Something-Bad(tm) happens in the middle of the upgrade.

-- 
/*Francois Labreque*/#local a=x+y;#local b=x+a;#local c=a+b;#macro P(F//
/*    flabreque    */L)polygon{5,F,F+z,L+z,L,F pigment{rgb 9}}#end union
/*        @        */{P(0,a)P(a,b)P(b,c)P(2*a,2*b)P(2*b,b+c)P(b+c,<2,3>)
/*   gmail.com     */}camera{orthographic location<6,1.25,-6>look_at a }


Post a reply to this message

From: Warp
Subject: Re: Funniest bug ever
Date: 23 Feb 2013 19:06:34
Message: <5129598a@news.povray.org>
Francois Labreque <fla### [at] videotronca> wrote:
> 1. Redirect stderr to a different file.  And check for that file being 
> more than 0 bytes before claiming the install completed succesfully.

All programs should return an error code if an error happens. If a program
ends in error but returns a success code, that program is broken (and can
subsequently break other programs, such as 'make', which relies on programs
returning non-success on error.)

-- 
                                                          - Warp


Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Funniest bug ever
Date: 24 Feb 2013 06:52:27
Message: <5129fefb@news.povray.org>
On 24/02/2013 12:06 AM, Warp wrote:
> All programs should return an error code if an error happens. If a program
> ends in error but returns a success code, that program is broken

I agree. However, unfortunately it seems that by default Bash ignores 
all such errors and happily proceeds, unless you manually suffix every 
single command with an explicit return-code check.


Post a reply to this message

Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.