POV-Ray: Newsgroups: povray.off-topic: Database Questions: Re: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions : Re: Database Questions		Server Time 3 Sep 2024 19:20:01 EDT (-0400)
From: Darren New
Date: 10 Mar 2011 20:46:10
Message: <4d797ee2$1@news.povray.org>
Tom Austin wrote:
>> If the primary key is the cash register number + timestamp to the

> In the case of the receipts, what relationships do you create to have a 
> table reference a particular receipt?

You just store the same columns.

Let's say you had a table that recorded customer service records about a 
particular receipt.

create table receipts (
    register INT NOT NULL,
    issued TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
       ON UPDATE CURRENT_TIMESTAMP, -- MySql sucks
    amount decimal(10,2) NOT NULL,
    clerk INT NOT NULL,
    ....
    primary key (register, issued),
    FOREIGN KEY (clerk) references clerks(ID),
    ...);

create table complaints (
    receipt_register INT NOT NULL,
    receipt_issued TIMESTAMP NOT NULL,
    csr_talking INT not null,
    complaint_answered TIMESTAMP NOT NULL,
    resolution blob,
    previous_call_csr INT,
    previous_answered timestamp,
    primary key (csr_talking, complaint_answered),
    foreign key (receipt_register, receipt_issed)
       references receipts(register, issued),
    foreign key (previous_call_csr, previous_answered)
       references complaints(csr_talking, complaint_answered)
    );

So, modulo the fact that MySql dicks with timestamps in tables depending on 
how many timestamps are in the table, and disregarding the fact that MySql 
"NOT NULL" doesn't prevent you from inserting records with NULLs, we can 
look at this structure:

A receipt is identified by the register number it was issued at and the time 
it was issued. There's also other details, like the clerk that was logged 
in, the amount of the sale, and so on.

When the phone rings and someone complains, the CSR creates a new complaint 
record. This record is identified by the CSR taking the call and when they 
took the call. The complaint is about a particular receipt, identified by 
the register the receipt came from and the time that receipt was issued. 
This may be a follow-up call, in which case the CSR has the previous calls 
about this on the screen and may point out that this is a follow-up call 
from a particular previous complaint, identified by *that* csr at *that* time.

Does that clarify?

>> Or not updating them at all, if you pass in the same data twice by 
>> mistake.
>>
>> Look at a CSV file full of receipts like above. Reading each row is
>> going to clobber the record already there. If you use an auto-increment
>> ID, you'll wind up with 200 reciepts if you read a 100-row CSV in twice.
>>
> 
> So, the primary key relationship keeps the data from being inserted by 
> default.

Or at least you know when something goes wrong, instead of happily inserting 
two copies of every credit card, or pole, or whatever.

>> IME, a large part of database fuckage is either operator error (like
>> loading stuff twice) or simple programming errors (like debiting
>> everyone's account instead of just the account of the person you should
>> have debited). The idea of making idempotent updates and keeping
>> write-once historical records came about because that makes these things
>> easy to fix.
>>
> 
> And the trick is to have the database strong enough to prevent such 
> errors from causing trouble in the first place.  You spend your time 
> setting it up right or making fixes for problems that crop up.

Precisely. That's exactly the "C" part of the database. But if you *do* 
screw up, being able to go back and fix it and send out the next set of 
bills with a credit for the screw-up is helpful compared to going "I don't 
know, boss, all the balances are wrong and there's no way to know who we 
billed twice for last week."

Your database sounds somewhat different, but to the extent you can actually 
figure out by looking at the database what happened to change it, the easier 
it's going to be to fix problems caused by going fast. Just something to 
keep in mind.

> I think it will - but the new push is to be very dynamic - so dynamic 
> that taking the time to set up a proper database is not possible.  From 
> one extreme to another.  I guess I rather take this extreme - it means 
> we are moving someplace.

Well, good luck with that. Some people can make that work, but I strongly 
suspect your boss is just a jerk. :-)

>>> I think I have run into this already
>>> select A from b where A.ID in ( select ID from C)
>>
>> No, that's cool. Well, not that exact query, but that sort of thing. The
>> trick is it's all in SQL. You aren't looping inside the PHP code or
>> Javascript code or whatever you're using to submit that code to the
>> database engine.
>>
> 
> So what's not cool about the query - should something like that be avoided?

No, the query is cool. It's fine. (I mean, other than being syntactically 
wrong. :-) What you probably wanted to write is

select A from B where B.ID in (select ID from C)

but a better way of writing that is

select A from B, C where B.ID = C.ID

The nice thing about SQL is that both those statements will probably do 
exactly the same thing inside the server.

The thing I'm trying to get you to avoid is putting a query inside a loop 
that doesn't involve user interaction. I.e., you shouldn't have a for loop 
or while loop in your code with a SQL select statement inside the body of 
the for loop unless there's other I/O to read non-database files or take 
clicks from the GUI or something.  You should never need to do something like

$first_result = query_mysql("SELECT name from CUSTOMERS")
foreach ($name in $first_result)
    $second_result = query_mysql(
        "SELECT account from ACCOUNTS where ACCOUNTS.NAME=" + $name)

Instead, that should be one query:
select name, account from customers, accounts
    where customer.name=accounts.name


> Does this keep from having this case?
> PoleID1    PoleID2
> 1       2
> 2       1

Well, technically not.

> Does keying prevent this or does it have to be taken care of some other 
> way?

To prevent this, you'd probably have to use a trigger. The "consistency" in 
a database is supposed to be done by having a trigger reject the insert if 
the poles in a different order are already there.

That's why you have to take time to think about it: If you just type it in 
as fast as you can think, you wind up with something like this and then 
you're screwed. :-)


> Ok, I was confused but now I think I see clearly.  Instead of 
> dereferencing to attachmentID like I did, one just uses the (poleID) and 
> (attachment height) as the key from the attachments table.

Yes, you can do it that way.

> I don't think you have proposed anything much different than what we 
> have - it just rids the tables if ID's that can cause more trouble than 
> they are worth.

If you can manage that, it makes things more clear.

For one thing, for example, if you identify cables by (for example) the 
poles they're run between in the GUI, you don't have to then go look up the 
ID from that information in order to use it elsewhere. If you find that 
you're never actually referencing the ID in queries when you're not also 
referencing the pole and height, chances are that ID can be replaced by the 
pole and height.

> I like the concept - make it robust enough that it can't get fouled up - 
> no matter what you try to do.

That's the goal, yes. That's the idea behind the "C" part of ACID. When you 
have a monsterous database that has to live for decades, it's worth spending 
weeks or months figuring that bit out. How much it's worth to *you* depends 
on how easy it is to clean up mistakes.  If each job essentially starts 
over, it might not be worth putting too much time in up front, if you can 
just (say) re-import the excel sheets for where the poles are on the third 
job if you find the second job messed stuff up too bad.

> Now that I think about it more, I don't think I have actual circular 
> going on.

It didn't sound to me like you did.

> The touble arises if the attachments/midspan tables create a connection 
> that is not defined int he connections table.

Exactly. There really isn't an attachment without a midspan or a midspan 
without an attachment, nor is there an attachment on just one pole.

> How would one go about this kind of 'building' and how would keys play a 
> role.

I would say "color" is an attribute that deserves its own table. Assuming 
that all colors are equal and don't need any calculations and such, I'd 
store the colors in a table with
    ID int, ColorName varchar(20)
or something like that.

This lets you change the color names (boss wants them all in caps) or later 
add internationalized names (now you have ID, ColorName, Language in the 
table), etc. Plus, it's more space efficient to store an ID number than a 
color name everywhere. It also keeps people from misspelling a color and you 
wind up having both Green and Gren poles.

When adding family support, I would then add a second table that holds the 
allowable families. If each color was in only one family, I would put the ID 
of the family in the colors table. Otherwise, I would have a third table 
with Color.ID and Family.ID.

> Would I simply specify the column as 'text' and just put the color into 
> it for now.  Then pull from a distinct query to get the list of 
> available colors.  Then in the future make a new table that uses this 
> column as a foreign key that gives the families.  but then the distinct 
> query is no longer needed as the new table has the distinct values.

> Or should I go ahead and create the additional table so that it can 
> provide the colors with a 1 stop place to add a new one?  Then as the 
> families are needed just add them to the table?
> 
> 
> I know this might not be 100% clear, but how would you go about this?

Generally, if you have the same thing repeated many times in a table's 
column from a small selection of values, but it's *not* a foreign key, 
you've probably done something wrong. You probably want a table with that 
set of values as its primary key.

You can certainly start out with the color in its own column, then move 
things around later when you find it's problematic. You might have some 
processing time, taking the system offline for half an hour while you 
rearrange the tables and such, but I'm guessing (given you were using Access 
before) that this wouldn't be a problem every month or two. :-)

You can even start out with the color as text in the column, later add 
another table that has the colors as the primary key and add the FK 
constraint, and after that start putting the different translations or 
families or whatever on that second column. I.e., where I refered to the 
"color.ID" above, you could just use the text of the color name as the ID, 
then have the translation or the family or whatever keyed off of that. Only 
after space or speed becomes a problem do you really need to worry about 
making it into an integer ID or some such.

> We won't get very big.  We survey maybe 5000+- poles in a given year.

You should have zero concern about performance. Even a pretty weak machine 
should be able to do something with everything in your entire database 
several times a minute, let alone individual small groups of stuff.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."
Post a reply to this message