|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
OK, what is the purpose of an error message?
You *might* be tempted to say that it's meant to tell you that an error
has occurred. But actually, it's more than that. It's supposed to tell
you *what* error occurred! (Usually the fact that _something_ went wrong
is damn obvious.)
Take the following error message, for example:
PERC 4/SC Controller 0, Array Disk 0:1 Sense Key = 3, Sense Code = 11,
Sense Qualifier = 0. If this disk is part of a non-redundant virtual
disk, the data for this block cannot be recovered. The disk will require
replacement and data restore. If this disk is part of a redundant
virtual disk, the data in this block will be reallocated.
...WTF?
So... is my data gone or not? If it is, why is it gone? If it isn't,
what's the problem? (And if there isn't a problem, why are you giving me
an error message?)
Seriously - YOU'RE THE ARRAY CONTROLLER! You KNOW whether this is part
of a redundant virtual disk or not. Why can't you just show me only the
relevant parts of the error message??
Hmm. "The data in this block will be reallocated." What does that *mean*
anyway? Can you recover the data or not?
As you can see, this is a pretty useless error message. It basically
says "Event X has happened. If condition Y is true then this represents
an error, otherwise it doesn't." In other words, this error might not
even *be* an error. So it's failed purpose #1 - telling you an error has
occurred. It also utterly fails purpose #2 - WHY is any of this
happening in the first place? Presumably a block on the disk was
unreadable - but it doesn't SAY so anywhere.
Sadly, this kind of thing seems extremely common. Developers don't seem
to think it's important for people to know what an error condition
actually means or what the implications are.
Oh, did I mention? It's a Dell server.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
And lo on Fri, 04 Jan 2008 10:55:06 -0000, Invisible <voi### [at] devnull> did
spake, saying:
> OK, what is the purpose of an error message?
>
> You *might* be tempted to say that it's meant to tell you that an error
> has occurred. But actually, it's more than that. It's supposed to tell
> you *what* error occurred! (Usually the fact that _something_ went wrong
> is damn obvious.)
No no no, the priorities are as follows
1) Tell the user an error has occured
2) Tell the user what the system/programme is going to do regarding said
error
3) Tell the user what the error actually is
4) Suggest a course of action that the user can take to solve it.
Trouble is everyone gets to around the second item then things get
complicated and they turn to something else. Hence error messages telling
you that msgscr32 has caused a fault in kernel32 at point 0000 FFFF... to
which every general user will respond, but I'm not running any programmes
called msgsvr32 or kernel are they viruses should I search for and delete
them?
--
Phil Cook
--
I once tried to be apathetic, but I just couldn't be bothered
http://flipc.blogspot.com
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> OK, what is the purpose of an error message?
>
> You *might* be tempted to say that it's meant to tell you that an error
> has occurred. But actually, it's more than that. It's supposed to tell
> you *what* error occurred! (Usually the fact that _something_ went wrong
> is damn obvious.)
My favourite is still the most classical Windows error message, which
says "Unexpected error".
> Take the following error message, for example:
>
> PERC 4/SC Controller 0, Array Disk 0:1 Sense Key = 3, Sense Code = 11,
> Sense Qualifier = 0. If this disk is part of a non-redundant virtual
> disk, the data for this block cannot be recovered. The disk will require
> replacement and data restore. If this disk is part of a redundant
> virtual disk, the data in this block will be reallocated.
>
-clip-
>
> As you can see, this is a pretty useless error message. It basically
Actually, it's a perfect error message. It says that disk 0:1 is broken
and needs to be replaced. It also tells as misc information that if you
weren't running RAID, it's bye-bye -time.
--
Eero "Aero" Ahonen
http://www.zbxt.net
aer### [at] removethiszbxtnetinvalid
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> PERC 4/SC Controller 0, Array Disk 0:1
This disk had a problem.
> Sense Key = 3, Sense Code = 11, Sense Qualifier = 0.
This is what you need to look up in the standard specs, or the
documentation for the disk, to see precisely the problem.
> If this disk is part of a non-redundant virtual
> disk, the data for this block cannot be recovered.
If this is your only copy, you're hosed.
> The disk will require replacement and data restore.
If so, go fix it.
> If this disk is part of a redundant
> virtual disk, the data in this block will be reallocated.
If I already have a second copy, I'll make a second copy somewhere else
on the failing disk. But you should probably get a second disk ready to
replace the failing one.
Aren't you system admin? Don't you know how to tell if you have a
redundant disk? :-)
--
Darren New / San Diego, CA, USA (PST)
It's not feature creep if you put it
at the end and adjust the release date.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> Invisible wrote:
>> PERC 4/SC Controller 0, Array Disk 0:1
>
> This disk had a problem.
>
>> Sense Key = 3, Sense Code = 11, Sense Qualifier = 0.
>
> This is what you need to look up in the standard specs, or the
> documentation for the disk, to see precisely the problem.
>
> > If this disk is part of a non-redundant virtual
>> disk, the data for this block cannot be recovered.
>
> If this is your only copy, you're hosed.
>
>> The disk will require replacement and data restore.
>
> If so, go fix it.
>
>> If this disk is part of a redundant virtual disk, the data in this
>> block will be reallocated.
>
> If I already have a second copy, I'll make a second copy somewhere else
> on the failing disk. But you should probably get a second disk ready to
> replace the failing one.
>
> Aren't you system admin? Don't you know how to tell if you have a
> redundant disk? :-)
More importantly, DOESN'T THE DISK CONTROLLER KNOW?? And if so, why is
it giving me two options? Why not just tell me the one that's applicable?
As it happens, I got this error while rebuilding a RAID-1 set. So...
does this mean the data was recovered or not?
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 wrote:
>
> As it happens, I got this error while rebuilding a RAID-1 set. So...
> does this mean the data was recovered or not?
>
From 2-disk array, of which one disk has been changed, can fail either
disk with all data or disk with no data. It's up to which disk failed to
tell if the data has survived.
--
Eero "Aero" Ahonen
http://www.zbxt.net
aer### [at] removethiszbxtnetinvalid
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 wrote:
> More importantly, DOESN'T THE DISK CONTROLLER KNOW??
Not necessarily. You've never heard of software raid?
> As it happens, I got this error while rebuilding a RAID-1 set. So...
> does this mean the data was recovered or not?
I think it means you're screwed, since you didn't have a mirrored disk
(yet).
--
Darren New / San Diego, CA, USA (PST)
It's not feature creep if you put it
at the end and adjust the release date.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> As it happens, I got this error while rebuilding a RAID-1 set. So...
> does this mean the data was recovered or not?
I would guess it means it succeded, but that one of the
platters has a physical scratch, it didn't know if data had
been lost, (but it wasn't, it was mirrored on the other HD),
and portions of the platter are now marked as scratched,
and won't be used. I'm not sure how long after getting
a scratch a HD will last... maybe a long time. It's worth an
error message to me.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
>
> I think it means you're screwed, since you didn't have a mirrored disk
> (yet).
>
It can't be known by us (it could by Andrew). Let's say we have disk 0
and disk 1. Disk 1 dies and gets replaced by disk 2, so we'll have disks
0 and 2. Now we'll get the error message. IF the message comes for disk
0, we're screwed, if the messages comes for disk 2, we're not screwed,
but running on one disk again (and taking one hell of backups right away
- again, to stay at the latest point of data).
--
Eero "Aero" Ahonen
http://www.zbxt.net
aer### [at] removethiszbxtnetinvalid
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> Orchid XP v7 wrote:
>> More importantly, DOESN'T THE DISK CONTROLLER KNOW??
>
> Not necessarily. You've never heard of software raid?
Yes. But this isn't. This is a hardware RAID solution, and the message
is from the software that talks to the RAID controller.
>> As it happens, I got this error while rebuilding a RAID-1 set. So...
>> does this mean the data was recovered or not?
>
> I think it means you're screwed, since you didn't have a mirrored disk
> (yet).
OK. Cool. Now... if there's a problem with the disk, why wasn't it
reported sooner? (Like, while there were still 2 working disks?)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|