POV-Ray: Newsgroups: povray.off-topic: Good paper on non-ACID databases

POV-Ray : Newsgroups : povray.off-topic : Good paper on non-ACID databases		Server Time 6 Sep 2024 09:20:12 EDT (-0400)

Goto Latest 10 Messages

Next 1 Messages >>>

From: Darren New
Subject: Good paper on non-ACID databases
Date: 10 Mar 2009 19:57:24
Message: <49b6fe64$1@news.povray.org>

http://queue.acm.org/detail.cfm?id=1394128

Does a good job of describing how to go about breaking ACID for eventual 
consistency and higher availability.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: Invisible
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 07:43:11
Message: <49b7a3cf$1@news.povray.org>

Darren New wrote:
> http://queue.acm.org/detail.cfm?id=1394128
> 
> Does a good job of describing how to go about breaking ACID for eventual 
> consistency and higher availability.

It seems to me that "BASE" fundamentally requires an ACID-compliant 
database. They're only talking about relaxing consistancy requirements 
in a few key areas. A completely non-ACID database wouldn't function 
properly. (E.g., if the database engine's own metadata became 
inconsistent, the database engine wouldn't be able to read the database 
any more.)

All I can say is, I hope *I* never have to deal with a database so vast 
that I have to risk having it give me gibberish from time to time just 
so that it's "fast enough".

Also...

"For constraints to be applied, the tables must reside on a single 
database server, precluding horizontal scaling as transaction rates grow."

Um... why?

Still, now Amazon's message service (whatever the hell it's called) 
makes sense.

I do feel, though, that the mechanisms described for processing work now 
and restoring consistency later by passing messages, and detecting when 
consistency has returned and so forth really ought to be part of the 
database engine rather than application code...

Post a reply to this message

From: Darren New
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 16:07:40
Message: <49b81a0c$1@news.povray.org>

Invisible wrote:
> Darren New wrote:
>> http://queue.acm.org/detail.cfm?id=1394128
>>
>> Does a good job of describing how to go about breaking ACID for 
>> eventual consistency and higher availability.
> 
> It seems to me that "BASE" fundamentally requires an ACID-compliant 
> database.

They describe it in terms of that, but no, it doesn't. Because it's the "C" 
they're leaving out, you see. They're saying "this is how you build an 
unboundedly large BASE database from smaller ACID databases that may or may 
not be online at any given time."

> They're only talking about relaxing consistancy requirements 
> in a few key areas. A completely non-ACID database wouldn't function 
> properly. (E.g., if the database engine's own metadata became 
> inconsistent, the database engine wouldn't be able to read the database 
> any more.)

Certainly. But that's not what the "C" means in ACID. Lots of people claim 
that's what the "C" means, because they don't have the right kind of "C" and 
want to claim they do. "I don't corrupt the entire database when the server 
crashes" isn't something to brag about, any more than "I don't stall out 
driving down the freeway" is a selling point for an automobile.

> All I can say is, I hope *I* never have to deal with a database so vast 
> that I have to risk having it give me gibberish from time to time just 
> so that it's "fast enough".

It's pretty normal, actually. For example, in the credit card world, "real 
time" means "we do it every day, instead of waiting for the end of the 
billing cycle."  Do you think when the merchant gives you a refund, you can 
then go and spend that money at the next store over 5 minutes later?

> "For constraints to be applied, the tables must reside on a single 
> database server, precluding horizontal scaling as transaction rates grow."
> 
> Um... why?

A constraint is the "C" kind of consistency. It says you can't update one 
table without checking another table it's OK. (Foreign Keys are one kind of 
constraint, but there are lots of others.) If the table with the primary key 
and the table with the foreign key are on different servers, and the table 
with the primary key crashes, you can't check when you insert a row into the 
other table that the constraint is still good.

> Still, now Amazon's message service (whatever the hell it's called) 
> makes sense.

Uh, sorta. More like Microsoft's queue mechanism, which is a lot like 
Amazon's basic mechanism, except the queue messages can interact with the 
same transactions that your SQL server and file system are interacting with. 
You can do things like create a transaction, update some SQL rows, queue a 
couple messages, and rename a couple of files, then decide you want to roll 
it all back, and everyone involved will never see the SQL rows, the 
messages, or the file name changes.

Amazon's message service is for doing that sort of thing, but it takes a 
different approach to the reliability - you peek the message off the front, 
which locks it, and then it's a timeout that releases the message. It's 
actually pretty poor if you want to do something like queue POV-Ray renders 
for which you have no idea how long they'll take.

> I do feel, though, that the mechanisms described for processing work now 
> and restoring consistency later by passing messages, and detecting when 
> consistency has returned and so forth really ought to be part of the 
> database engine rather than application code...

It is. This is how it does things when you have *multiple* database engines, 
and you want to keep going even when the hardware for some of it has burst 
into flames.

Lots of systems are built like this internally, too. For example, there are 
mail systems where each rule on processing the email[1] winds up pulling the 
email off the queue and putting the new one on the end. Same sort of thing.

[1] Like, "if it doesn't have a domain on the from address, put this domain 
on it before it leaves the local intranet."

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: Orchid XP v8
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 16:29:40
Message: <49b81f34@news.povray.org>

>> It seems to me that "BASE" fundamentally requires an ACID-compliant 
>> database.
> 
> They describe it in terms of that, but no, it doesn't.

Sure. Because if you use flat files instead, it's *completely possible* 
to allow multiple threads to alter the data without completely screwing 
it up.

>> They're only talking about relaxing consistancy requirements in a few 
>> key areas. A completely non-ACID database wouldn't function properly. 
>> (E.g., if the database engine's own metadata became inconsistent, the 
>> database engine wouldn't be able to read the database any more.)
> 
> Certainly. But that's not what the "C" means in ACID.

Didn't say it was. It's more to do with A, really. (And D of course.)

> "I don't corrupt the entire database 
> when the server crashes" isn't something to brag about, any more than "I 
> don't stall out driving down the freeway" is a selling point for an 
> automobile.

I wouldn't mind, but this isn't exactly a trivial thing to guarantee.

>> All I can say is, I hope *I* never have to deal with a database so 
>> vast that I have to risk having it give me gibberish from time to time 
>> just so that it's "fast enough".
> 
> It's pretty normal, actually. For example, in the credit card world, 
> "real time" means "we do it every day, instead of waiting for the end of 
> the billing cycle."  Do you think when the merchant gives you a refund, 
> you can then go and spend that money at the next store over 5 minutes 
> later?

I would presume so, yes. (Not that I actually own a credit card - or 
ever spend money, for that matter...)

>> "For constraints to be applied, the tables must reside on a single 
>> database server, precluding horizontal scaling as transaction rates 
>> grow."
>>
>> Um... why?
> 
> A constraint is the "C" kind of consistency. It says you can't update 
> one table without checking another table it's OK. (Foreign Keys are one 
> kind of constraint, but there are lots of others.) If the table with the 
> primary key and the table with the foreign key are on different servers, 
> and the table with the primary key crashes, you can't check when you 
> insert a row into the other table that the constraint is still good.

OK, so if one of the servers is down, you can't do certain things. I'm 
still not seeing why constraints can't be applied across several 
databases under normal operating conditions.

>> Still, now Amazon's message service (whatever the hell it's called) 
>> makes sense.
> 
> Uh, sorta. More like Microsoft's queue mechanism, which is a lot like 
> Amazon's basic mechanism, except the queue messages can interact with 
> the same transactions that your SQL server and file system are 
> interacting with. You can do things like create a transaction, update 
> some SQL rows, queue a couple messages, and rename a couple of files, 
> then decide you want to roll it all back, and everyone involved will 
> never see the SQL rows, the messages, or the file name changes.

Mmm, interesting. (Especially given that Microsoft is industry-renouned 
for their lack of innovation.)

>> I do feel, though, that the mechanisms described for processing work 
>> now and restoring consistency later by passing messages, and detecting 
>> when consistency has returned and so forth really ought to be part of 
>> the database engine rather than application code...
> 
> It is. This is how it does things when you have *multiple* database 
> engines, and you want to keep going even when the hardware for some of 
> it has burst into flames.

Well, the article in question is just explaining how to implement this 
stuff "by hand". But I guess it wouldn't be so hard to build a small 
framework to do it for you. (Read: to do it correctly.)

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 16:58:27
Message: <49b825f3@news.povray.org>

Orchid XP v8 wrote:
>>> It seems to me that "BASE" fundamentally requires an ACID-compliant 
>>> database.
>>
>> They describe it in terms of that, but no, it doesn't.
> 
> Sure. Because if you use flat files instead, it's *completely possible* 
> to allow multiple threads to alter the data without completely screwing 
> it up.

Who said anything about flat files or multiple threads? Heck, who said 
anything about files at all? :-)

>>> They're only talking about relaxing consistancy requirements in a few 
>>> key areas. A completely non-ACID database wouldn't function properly. 
>>> (E.g., if the database engine's own metadata became inconsistent, the 
>>> database engine wouldn't be able to read the database any more.)
>>
>> Certainly. But that's not what the "C" means in ACID.
> 
> Didn't say it was. It's more to do with A, really. (And D of course.)

OK. I have no idea what you're talking about here. Certainly if your 
database didn't have A, C, I, or D, then you'd just have a file system that 
can get corrupted. :-)  Like old DOS file systems, say.

>> "I don't corrupt the entire database when the server crashes" isn't 
>> something to brag about, any more than "I don't stall out driving down 
>> the freeway" is a selling point for an automobile.
> 
> I wouldn't mind, but this isn't exactly a trivial thing to guarantee.

It has been a well-solved problem for decades. Not trivial, but even your 
file system manages it nowadays, let alone something designed for crashes.

>> of the billing cycle."  Do you think when the merchant gives you a 
>> refund, you can then go and spend that money at the next store over 5 
>> minutes later?
> 
> I would presume so, yes. (Not that I actually own a credit card - or 
> ever spend money, for that matter...)

Well, you wouldn't think that, for example, the refund would actually have 
to be processed back at the bank before you get credit?

Put it another way: when you mail in the payment for your bill, do you 
expect it to be credited before you get back to your flat from the mailbox? 
Or would you expect it might take a day or two between the time you write 
the check in your checkbook until the time the payment shows up in your 
account at the electric company?

> OK, so if one of the servers is down, you can't do certain things. I'm 
> still not seeing why constraints can't be applied across several 
> databases under normal operating conditions.

Under normal conditions, certainly. But the idea is "this is how you make 
function beta keep working when function alpha is down." If Beta has 
constraints that depend on ALpha being functional, that doesn't work.

Sure, there's no problem if everything's *working*. This is how you make 
google search keep working even when the spiders have crashed. Or how you 
keep ads being served to people even when the system that lets you sign up 
new customers is rebooting.

> Mmm, interesting. (Especially given that Microsoft is industry-renouned 
> for their lack of innovation.)

They do some good stuff at the high end. And it's not really "innovative". 
It's just "100% right".  Whereas most folks are happy with 98% right or so.

> Well, the article in question is just explaining how to implement this 
> stuff "by hand". But I guess it wouldn't be so hard to build a small 
> framework to do it for you. (Read: to do it correctly.)

Right. If you want to do it small, you don't really need the advice. But 
yes, generally you encapsulate that sort of thing in "business logic" 
layers.  Or, if you want to do it generically, you can use something like 
Erlang, which is how they work all that sort of stuff.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: Darren New
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 17:17:46
Message: <49b82a7a$1@news.povray.org>

Orchid XP v8 wrote:
> Sure. Because if you use flat files instead, it's *completely possible* 
> to allow multiple threads to alter the data without completely screwing 
> it up.

BTW, you *are* aware that all modern desktop-level databases store their 
data in flat files and use multiple threads?  Are *are* aware that google's 
databases are all stored in "flat files"?

It's actually pretty easy - updates go to the end of the file.

I was reading this the other day. It's pretty good.
http://www.relisoft.com/book/tech/8trans.html

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: Orchid XP v8
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 17:35:41
Message: <49b82ead@news.povray.org>

>> Sure. Because if you use flat files instead, it's *completely 
>> possible* to allow multiple threads to alter the data without 
>> completely screwing it up.
> 
> BTW, you *are* aware that all modern desktop-level databases store their 
> data in flat files and use multiple threads?

Yes. But that's why you have a database - to manage the extreme 
complexity for you, so you don't have to do it by hand. You seem to be 
under the impression that this is somehow "not necessary".

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 17:49:40
Message: <49b831f4$1@news.povray.org>

Orchid XP v8 wrote:
> Yes. But that's why you have a database - to manage the extreme 
> complexity for you, so you don't have to do it by hand. You seem to be 
> under the impression that this is somehow "not necessary".

I'm not sure why you think I said that. However, having said that, all you 
need is a decent file system. If you put each record in a different file 
with a unique name, you can do a fair amount of goodness with message queues 
and such without ever worrying about a database manager.

For example, the traditional mechanism for doing this is to create a file 
with a .tmp on the end of the name, and then rename it when it's ready to be 
processed. If it doesn't matter what order they're processed in, you take a 
sufficiently high-resolution clock, plus an IP address of the host, plus the 
PID of the process creating the file, and use that as the file name. You can 
have a bunch of machines updating the same file system without causing a 
problem.

The way CouchDB works it is when they commit work, they write the header 
twice, in widely spaced areas of the file, with a timestamp and checksum at 
the end of each. If one checksum is bad, they know the other is the right 
header. If both checksums are good, they know the one with the later 
timestamp is right.

Either of those work for writing messages in a queue, for example. Write the 
message as something like XML, where you can tell if it's complete, put a 
checksum at the end, and anything with a bad checksum when you start a 
recovery didn't get finished.

If you have separate log files that are append-only until they get too full, 
and you keep old copies of accounts, and those accounts are only updated 
relatively rarely (say, order fulfillment or credit card clearing), you can 
recover from failures by looking thru old logs when you come across a 
customer file that's corrupt, looking at the previous customer file, and 
replaying the transaction(s) since then.  That's basically what the database 
does when it crashes.

I'm just saying, it isn't that complex to accomplish "manually". You need an 
ACID database - it's just not that hard to build one if you can avoid making 
it general. The hard part of something like Oracle is not the ACID, but the 
ACID and high performance regardless of what you store on it. If you know 
what you're going to store and you code only the ACID properties you need 
(for example, hard-coded constraints instead of TRIGGER statements in SQL), 
it's pretty easy to do.

BTDTGTTS.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: Orchid XP v8
Subject: Re: Good paper on non-ACID databases
Date: 11 Mar 2009 18:06:29
Message: <49b835e5$1@news.povray.org>

Darren New wrote:

> I'm just saying, it isn't that complex to accomplish "manually". You 
> need an ACID database - it's just not that hard to build one if you can 
> avoid making it general. The hard part of something like Oracle is not 
> the ACID, but the ACID and high performance regardless of what you store 
> on it. If you know what you're going to store and you code only the ACID 
> properties you need (for example, hard-coded constraints instead of 
> TRIGGER statements in SQL), it's pretty easy to do.

Personally, I take the view that something like Oracle is the product of 
many man-centuries of R&D, designed, implemented and tested by more 
experts than you can shake a stick at, and there is basically no way in 
hell that anything I alone could design will ever come close to the 
performance and reliability of a real DB product such as this. In fact, 
I'd think I would probably have trouble just designing something that 
produces *correct* results, never mind doing it fast.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Nicolas Alvarez
Subject: Re: Server crashes (was: Good paper on non-ACID databases)
Date: 12 Mar 2009 19:03:23
Message: <49b994ba@news.povray.org>

Darren New wrote:
> It has been a well-solved problem for decades. Not trivial, but even your
> file system manages it nowadays, let alone something designed for crashes.

Are IIS and Exchange designed for crashes?

We're having power outage problems at work. It has always broken at least
one part of the server. Once it took Exchange down, we spent an hour or two
repairing the DB. Two other times it broke IIS (which Exchange depends on),
fixed by restoring a "metabase" backup. Another time it corrupted the DHCP
database (!), solved by dropping the DHCP service and creating it back...

Last night it decided to corrupt the registry. Today I stayed at work like
two hours more than I use to, and we didn't manage to make it *boot*.
Tomorrow will be hell. Designed for crashes?

We have a UPS since about two power outages ago. Looks like the UPS told
Windows to shut down, then for some reason cut power in the middle of the
shutdown process, making it even worse than without the UPS. 

But I shall rant more about it on a separate thread...

Post a reply to this message

Goto Latest 10 Messages

Next 1 Messages >>>