|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 1/13/2012 2:17 PM, Warp wrote:
> Tom Austin<voi### [at] voidnet> wrote:
>> Workstations could not access the file shares.
>
>> After some looking - memory module went dead - of course it had to be
>> the 'big' one - so from 2.5 GB down to 1.5GB.....
>
> One would think that if the server is mission-critical, it would have
> redundant hardware. In other words, if for example a memory module dies,
> the only consequence is that the amount of available RAM decreases and
> a big-ass notification is logged somewhere, but otherwise the service
> continues as usual.
>
Yes, I agree, but we are a very small business. We are only 7 people
and have doubled in size int he past year.
The Windows server is an old low end Dell server machine.
I'm glad it was the memory module and not something on the MB.
We are working on migrating off of it as time allows.
> Of course this requires specialized server hardware, as well as
> software support. (I don't even know if Windows supports this. I'm
> assuming NT and its spawns ought to, but I have never heard either way.)
>
> And also I'm assuming this is not cheap, so management do not want.
>
We do have some methods to getting back up relatively quickly - tho they
are not instantaneous. Management is now willing to spend the money
where needed - tho I don't think we will go to fail-safe quite yet.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 16/01/2012 12:36 PM, Tom Austin wrote:
> We do have some methods to getting back up relatively quickly - tho they
> are not instantaneous. Management is now willing to spend the money
> where needed - tho I don't think we will go to fail-safe quite yet.
Yeah, it's funny... I've noticed this strange correlation between
expensive down-time and management willingness to invest in
fault-tolerant equipment. ;-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> On 16/01/2012 12:36 PM, Tom Austin wrote:
>
>> We do have some methods to getting back up relatively quickly - tho they
>> are not instantaneous. Management is now willing to spend the money
>> where needed - tho I don't think we will go to fail-safe quite yet.
>
> Yeah, it's funny... I've noticed this strange correlation between
> expensive down-time and management willingness to invest in
> fault-tolerant equipment. ;-)
One of my all-time favorite post-mortem meetings went something like this:
Background info: there was a major catastrophe the night before. A
repeat of prevous incidents. As a result, our customer made the news.
10:45am: we meet with our management to quickly explain to the big
bosses what happened, what we did to resolve it, and how we wrote this
nice proposal six months ago to permanently deal with the issue. Big
Boss says "we'll try to steer clear of We-Told-You-So, but we'll also
remind them that this could have been avoided. Let me do the talking
and jump in if I mess up on my technical mumbo-jumbo"
11:00am: We walk in the customer's board room. Before we're fully
seated, the CIO opens up by saying "We know you told us so last time,
but... is there something else we can do other than upgrade these
Whatchamacallits?"
...
It took them two more major incidents - including one where they ended
up on the cover of Time Magazine - before those old whatchamacallits
were removed.
--
/*Francois Labreque*/#local a=x+y;#local b=x+a;#local c=a+b;#macro P(F//
/* flabreque */L)polygon{5,F,F+z,L+z,L,F pigment{rgb 9}}#end union
/* @ */{P(0,a)P(a,b)P(b,c)P(2*a,2*b)P(2*b,b+c)P(b+c,<2,3>)
/* gmail.com */}camera{orthographic location<6,1.25,-6>look_at a }
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 16/01/2012 01:51 PM, Francois Labreque wrote:
> "We know you told us so last time,
> but... is there something else we can do other than upgrade these
> Whatchamacallits?"
>
> ...
>
> It took them two more major incidents - including one where they ended
> up on the cover of Time Magazine - before those old whatchamacallits
> were removed.
One has to wonder why people are so resistant to fixing the problem. The
solution is right there, and yet you want to go around the hard way. Why?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 1/16/2012 8:11 AM, Invisible wrote:
> On 16/01/2012 12:36 PM, Tom Austin wrote:
>
>> We do have some methods to getting back up relatively quickly - tho they
>> are not instantaneous. Management is now willing to spend the money
>> where needed - tho I don't think we will go to fail-safe quite yet.
>
> Yeah, it's funny... I've noticed this strange correlation between
> expensive down-time and management willingness to invest in
> fault-tolerant equipment. ;-)
As we get more people and process more data - the more down time will
cost. Eventually we may get to higher uptime requirements - but for now
2 hours of downtime 2x a year is not too bad.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Yeah, it's funny... I've noticed this strange correlation between
>> expensive down-time and management willingness to invest in
>> fault-tolerant equipment. ;-)
>
> As we get more people and process more data - the more down time will
> cost. Eventually we may get to higher uptime requirements - but for now
> 2 hours of downtime 2x a year is not too bad.
As with everything, it depends on just how expensive down-time actually
is. If the answer is "not very", you don't need to worry about fixing it
too much.
Amusing anecdote: One of our servers had a hardware RAID system. The
idea is that if one of the drives dies, the RAID controller will keep
the server operational. You know what died? THE RAID CONTROLLER! >_<
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 1/16/2012 11:07 AM, Invisible wrote:
>>> Yeah, it's funny... I've noticed this strange correlation between
>>> expensive down-time and management willingness to invest in
>>> fault-tolerant equipment. ;-)
>>
>> As we get more people and process more data - the more down time will
>> cost. Eventually we may get to higher uptime requirements - but for now
>> 2 hours of downtime 2x a year is not too bad.
>
> As with everything, it depends on just how expensive down-time actually
> is. If the answer is "not very", you don't need to worry about fixing it
> too much.
>
> Amusing anecdote: One of our servers had a hardware RAID system. The
> idea is that if one of the drives dies, the RAID controller will keep
> the server operational. You know what died? THE RAID CONTROLLER! >_<
I like what Google has set up - files are stored everywhere on cheap
hardware. If something fails, then it just gets swapped out. But the
cheap hardware fails often and requires more manpower to manage and
replace it.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 16/01/2012 7:12 PM, Tom Austin wrote:
> But the cheap hardware fails often and requires more manpower to manage
> and replace it.
Jobs! :-D
--
Regards
Stephen
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Amusing anecdote: One of our servers had a hardware RAID system. The
>> idea is that if one of the drives dies, the RAID controller will keep
>> the server operational. You know what died? THE RAID CONTROLLER! >_<
>
> I like what Google has set up - files are stored everywhere on cheap
> hardware. If something fails, then it just gets swapped out. But the
> cheap hardware fails often and requires more manpower to manage and
> replace it.
Yeah, but most people can't do that. Most people don't have the space,
ventilation or power requirements to host hundreds of boxes, nor the
money to pay a team of twenty people to keep it all running.
On top of that, most applications are /not/ designed for distributed
implementation. If you're Google, you can just /write/ the software you
need. Most business buy it off the shelf.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 1/17/2012 1:06, Invisible wrote:
> Yeah, but most people can't do that. Most people don't have the space,
> ventilation or power requirements to host hundreds of boxes, nor the money
> to pay a team of twenty people to keep it all running.
Google has hundreds of computers in their data centers. Unfortunately for
you, Google counts "a shipping container full of thousands of mother boards
and disk drives" as "a computer". ;-) It's really quite awesome. Upgrading
a server is known as "forklifting" it.
> On top of that, most applications are /not/ designed for distributed
> implementation. If you're Google, you can just /write/ the software you
> need. Most business buy it off the shelf.
And you know, I really, really miss SQL. :-)
--
Darren New, San Diego CA, USA (PST)
People tell me I am the counter-example.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |