POV-Ray: Newsgroups: povray.off-topic: Dual Server Failure

POV-Ray : Newsgroups : povray.off-topic : Dual Server Failure		Server Time 17 Jun 2026 00:02:23 EDT (-0400)

<<< Previous 5 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Tom Austin
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 07:36:20
Message: <4f1419c4@news.povray.org>

On 1/13/2012 2:17 PM, Warp wrote:
> Tom Austin<voi### [at] voidnet>  wrote:
>> Workstations could not access the file shares.
>
>> After some looking - memory module went dead - of course it had to be
>> the 'big' one - so from 2.5 GB down to 1.5GB.....
>
>    One would think that if the server is mission-critical, it would have
> redundant hardware. In other words, if for example a memory module dies,
> the only consequence is that the amount of available RAM decreases and
> a big-ass notification is logged somewhere, but otherwise the service
> continues as usual.
>

Yes, I agree, but we are a very small business.  We are only 7 people 
and have doubled in size int he past year.

The Windows server is an old low end Dell server machine.
I'm glad it was the memory module and not something on the MB.

We are working on migrating off of it as time allows.

>    Of course this requires specialized server hardware, as well as
> software support. (I don't even know if Windows supports this. I'm
> assuming NT and its spawns ought to, but I have never heard either way.)
>
>    And also I'm assuming this is not cheap, so management do not want.
>

We do have some methods to getting back up relatively quickly - tho they 
are not instantaneous.  Management is now willing to spend the money 
where needed - tho I don't think we will go to fail-safe quite yet.

Post a reply to this message

From: Invisible
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 08:11:20
Message: <4f1421f8$1@news.povray.org>

On 16/01/2012 12:36 PM, Tom Austin wrote:

> We do have some methods to getting back up relatively quickly - tho they
> are not instantaneous. Management is now willing to spend the money
> where needed - tho I don't think we will go to fail-safe quite yet.

Yeah, it's funny... I've noticed this strange correlation between 
expensive down-time and management willingness to invest in 
fault-tolerant equipment. ;-)

Post a reply to this message

From: Francois Labreque
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 08:51:02
Message: <4f142b46$1@news.povray.org>

> On 16/01/2012 12:36 PM, Tom Austin wrote:
>
>> We do have some methods to getting back up relatively quickly - tho they
>> are not instantaneous. Management is now willing to spend the money
>> where needed - tho I don't think we will go to fail-safe quite yet.
>
> Yeah, it's funny... I've noticed this strange correlation between
> expensive down-time and management willingness to invest in
> fault-tolerant equipment. ;-)

One of my all-time favorite post-mortem meetings went something like this:

Background info: there was a major catastrophe the night before.  A 
repeat of prevous incidents.  As a result, our customer made the news.

10:45am: we meet with our management to quickly explain to the big 
bosses what happened, what we did to resolve it, and how we wrote this 
nice proposal six months ago to permanently deal with the issue.  Big 
Boss says "we'll try to steer clear of We-Told-You-So, but we'll also 
remind them that this could have been avoided.  Let me do the talking 
and jump in if I mess up on my technical mumbo-jumbo"

11:00am: We walk in the customer's board room.  Before we're fully 
seated, the CIO opens up by saying "We know you told us so last time, 
but... is there something else we can do other than upgrade these 
Whatchamacallits?"

...

It took them two more major incidents - including one where they ended 
up on the cover of Time Magazine - before those old whatchamacallits 
were removed.

-- 
/*Francois Labreque*/#local a=x+y;#local b=x+a;#local c=a+b;#macro P(F//
/*    flabreque    */L)polygon{5,F,F+z,L+z,L,F pigment{rgb 9}}#end union
/*        @        */{P(0,a)P(a,b)P(b,c)P(2*a,2*b)P(2*b,b+c)P(b+c,<2,3>)
/*   gmail.com     */}camera{orthographic location<6,1.25,-6>look_at a }

Post a reply to this message

From: Invisible
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 09:05:42
Message: <4f142eb6$1@news.povray.org>

On 16/01/2012 01:51 PM, Francois Labreque wrote:

> "We know you told us so last time,
> but... is there something else we can do other than upgrade these
> Whatchamacallits?"
>
> ...
>
> It took them two more major incidents - including one where they ended
> up on the cover of Time Magazine - before those old whatchamacallits
> were removed.

One has to wonder why people are so resistant to fixing the problem. The 
solution is right there, and yet you want to go around the hard way. Why?

Post a reply to this message

From: Tom Austin
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 10:56:26
Message: <4f1448aa$1@news.povray.org>

On 1/16/2012 8:11 AM, Invisible wrote:
> On 16/01/2012 12:36 PM, Tom Austin wrote:
>
>> We do have some methods to getting back up relatively quickly - tho they
>> are not instantaneous. Management is now willing to spend the money
>> where needed - tho I don't think we will go to fail-safe quite yet.
>
> Yeah, it's funny... I've noticed this strange correlation between
> expensive down-time and management willingness to invest in
> fault-tolerant equipment. ;-)

As we get more people and process more data - the more down time will 
cost.  Eventually we may get to higher uptime requirements - but for now 
2 hours of downtime 2x a year is not too bad.

Post a reply to this message

From: Invisible
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 11:07:52
Message: <4f144b58@news.povray.org>

>> Yeah, it's funny... I've noticed this strange correlation between
>> expensive down-time and management willingness to invest in
>> fault-tolerant equipment. ;-)
>
> As we get more people and process more data - the more down time will
> cost. Eventually we may get to higher uptime requirements - but for now
> 2 hours of downtime 2x a year is not too bad.

As with everything, it depends on just how expensive down-time actually 
is. If the answer is "not very", you don't need to worry about fixing it 
too much.

Amusing anecdote: One of our servers had a hardware RAID system. The 
idea is that if one of the drives dies, the RAID controller will keep 
the server operational. You know what died? THE RAID CONTROLLER! >_<

Post a reply to this message

From: Tom Austin
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 14:12:37
Message: <4f1476a5$1@news.povray.org>

On 1/16/2012 11:07 AM, Invisible wrote:
>>> Yeah, it's funny... I've noticed this strange correlation between
>>> expensive down-time and management willingness to invest in
>>> fault-tolerant equipment. ;-)
>>
>> As we get more people and process more data - the more down time will
>> cost. Eventually we may get to higher uptime requirements - but for now
>> 2 hours of downtime 2x a year is not too bad.
>
> As with everything, it depends on just how expensive down-time actually
> is. If the answer is "not very", you don't need to worry about fixing it
> too much.
>
> Amusing anecdote: One of our servers had a hardware RAID system. The
> idea is that if one of the drives dies, the RAID controller will keep
> the server operational. You know what died? THE RAID CONTROLLER! >_<


I like what Google has set up - files are stored everywhere on cheap 
hardware.  If something fails, then it just gets swapped out.  But the 
cheap hardware fails often and requires more manpower to manage and 
replace it.

Post a reply to this message

From: Stephen
Subject: Re: Dual Server Failure
Date: 16 Jan 2012 15:03:50
Message: <4f1482a6@news.povray.org>

On 16/01/2012 7:12 PM, Tom Austin wrote:
> But the cheap hardware fails often and requires more manpower to manage
> and replace it.

Jobs! :-D

-- 
Regards
     Stephen

Post a reply to this message

From: Invisible
Subject: Re: Dual Server Failure
Date: 17 Jan 2012 04:06:16
Message: <4f153a08@news.povray.org>

>> Amusing anecdote: One of our servers had a hardware RAID system. The
>> idea is that if one of the drives dies, the RAID controller will keep
>> the server operational. You know what died? THE RAID CONTROLLER! >_<
>
> I like what Google has set up - files are stored everywhere on cheap
> hardware. If something fails, then it just gets swapped out. But the
> cheap hardware fails often and requires more manpower to manage and
> replace it.

Yeah, but most people can't do that. Most people don't have the space, 
ventilation or power requirements to host hundreds of boxes, nor the 
money to pay a team of twenty people to keep it all running.

On top of that, most applications are /not/ designed for distributed 
implementation. If you're Google, you can just /write/ the software you 
need. Most business buy it off the shelf.

Post a reply to this message

From: Darren New
Subject: Re: Dual Server Failure
Date: 17 Jan 2012 22:04:13
Message: <4f1636ad$1@news.povray.org>

On 1/17/2012 1:06, Invisible wrote:
> Yeah, but most people can't do that. Most people don't have the space,
> ventilation or power requirements to host hundreds of boxes, nor the money
> to pay a team of twenty people to keep it all running.

Google has hundreds of computers in their data centers. Unfortunately for 
you, Google counts "a shipping container full of thousands of mother boards 
and disk drives" as "a computer". ;-)  It's really quite awesome. Upgrading 
a server is known as "forklifting" it.

> On top of that, most applications are /not/ designed for distributed
> implementation. If you're Google, you can just /write/ the software you
> need. Most business buy it off the shelf.

And you know, I really, really miss SQL. :-)

-- 
Darren New, San Diego CA, USA (PST)
   People tell me I am the counter-example.

Post a reply to this message

<<< Previous 5 Messages

Goto Latest 10 Messages

Next 10 Messages >>>