POV-Ray : Newsgroups : povray.off-topic : Database Questions : Re: Database Questions Server Time
3 Sep 2024 17:14:49 EDT (-0400)
  Re: Database Questions  
From: Tom Austin
Date: 11 Mar 2011 14:09:50
Message: <4d7a737e$1@news.povray.org>
On 3/10/2011 8:46 PM, Darren New wrote:
> Tom Austin wrote:
>>> If the primary key is the cash register number + timestamp to the
>
>> In the case of the receipts, what relationships do you create to have
>> a table reference a particular receipt?
>
> You just store the same columns.
>

Sounds like a lot of 'extra' data storage, but if a DB is fast anyway, 
then the gain of .01s on a query doesn't make a difference.

> Let's say you had a table that recorded customer service records about a
> particular receipt.
>
> create table receipts (
> register INT NOT NULL,
> issued TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
> ON UPDATE CURRENT_TIMESTAMP, -- MySql sucks
> amount decimal(10,2) NOT NULL,
> clerk INT NOT NULL,
> ....
> primary key (register, issued),
> FOREIGN KEY (clerk) references clerks(ID),
> ...);
>
> create table complaints (
> receipt_register INT NOT NULL,
> receipt_issued TIMESTAMP NOT NULL,
> csr_talking INT not null,
> complaint_answered TIMESTAMP NOT NULL,
> resolution blob,
> previous_call_csr INT,
> previous_answered timestamp,
> primary key (csr_talking, complaint_answered),
> foreign key (receipt_register, receipt_issed)
> references receipts(register, issued),
> foreign key (previous_call_csr, previous_answered)
> references complaints(csr_talking, complaint_answered)
> );
>
> So, modulo the fact that MySql dicks with timestamps in tables depending
> on how many timestamps are in the table, and disregarding the fact that
> MySql "NOT NULL" doesn't prevent you from inserting records with NULLs,
> we can look at this structure:
>

MySQL changes timestamps....  just precision or something else?


> A receipt is identified by the register number it was issued at and the
> time it was issued. There's also other details, like the clerk that was
> logged in, the amount of the sale, and so on.
>
> When the phone rings and someone complains, the CSR creates a new
> complaint record. This record is identified by the CSR taking the call
> and when they took the call. The complaint is about a particular
> receipt, identified by the register the receipt came from and the time
> that receipt was issued. This may be a follow-up call, in which case the
> CSR has the previous calls about this on the screen and may point out
> that this is a follow-up call from a particular previous complaint,
> identified by *that* csr at *that* time.
>
> Does that clarify?
>

Yes, I think I am seeing it clearly.  Just store the data as it is.  be 
smart about what is picked for the keys and use IDs sparingly.

IDs are great, but they can allow duplicates and potentially break your 
DB unless your application is smart enough to prevent the error - but 
wait, the goal of a good db is that it cannot be broken no matter what 
accesses it.

IDs also require that you get the ID so that you can join on it - even 
if you have the other data.  If you have the other data anyway, why go 
fetch an ID...

>>> Or not updating them at all, if you pass in the same data twice by
>>> mistake.
>>>
>>> Look at a CSV file full of receipts like above. Reading each row is
>>> going to clobber the record already there. If you use an auto-increment
>>> ID, you'll wind up with 200 reciepts if you read a 100-row CSV in twice.
>>>
>>
>> So, the primary key relationship keeps the data from being inserted by
>> default.
>
> Or at least you know when something goes wrong, instead of happily
> inserting two copies of every credit card, or pole, or whatever.
>

I like the idea of building a database that prevents errors in the first 
place.  Let's see if I can get it put into place here.  If so, I think 
it will happen over time just because of how fast people like to work here.

>
> Precisely. That's exactly the "C" part of the database. But if you *do*
> screw up, being able to go back and fix it and send out the next set of
> bills with a credit for the screw-up is helpful compared to going "I
> don't know, boss, all the balances are wrong and there's no way to know
> who we billed twice for last week."
>

We currently have some of this.  But it a balance - is our data good 
enough to get the client's requirements done.....  Most of the time yes.


> Your database sounds somewhat different, but to the extent you can
> actually figure out by looking at the database what happened to change
> it, the easier it's going to be to fix problems caused by going fast.
> Just something to keep in mind.
>

Yes, that's some of the push for simplifying it - but from a usability 
standpoint.  I think the person sees how much effort I have put into 
coding and how long some things take and wants to reduce that.  The main 
problem is that everything was done at lightening speed so it will suck 
by definition.

>
> Well, good luck with that. Some people can make that work, but I
> strongly suspect your boss is just a jerk. :-)
>

No, the boss isn't a jerk in this case.  We just have a bunch of young 
people who don't know how to slow down.  Out of 6-7 people, only 2 of us 
are over the age of 26.  If we are to keep the younger crowd, we have to 
cater somewhat to their needs/expectations - as long as it moves us in 
the right direction.


>>>> I think I have run into this already
>>>> select A from b where A.ID in ( select ID from C)
>>>
>>> No, that's cool. Well, not that exact query, but that sort of thing. The
>>> trick is it's all in SQL. You aren't looping inside the PHP code or
>>> Javascript code or whatever you're using to submit that code to the
>>> database engine.
>>>
>>
>> So what's not cool about the query - should something like that be
>> avoided?
>
> No, the query is cool. It's fine. (I mean, other than being
> syntactically wrong. :-) What you probably wanted to write is
>
> select A from B where B.ID in (select ID from C)
>
> but a better way of writing that is
>
> select A from B, C where B.ID = C.ID
>

Ok, so I typed it off the top of my head and referenced the wrong table :-)

But at least you knew what it was getting at.


> The nice thing about SQL is that both those statements will probably do
> exactly the same thing inside the server.
>
> The thing I'm trying to get you to avoid is putting a query inside a
> loop that doesn't involve user interaction. I.e., you shouldn't have a
> for loop or while loop in your code with a SQL select statement inside
> the body of the for loop unless there's other I/O to read non-database
> files or take clicks from the GUI or something. You should never need to
> do something like
>
> $first_result = query_mysql("SELECT name from CUSTOMERS")
> foreach ($name in $first_result)
> $second_result = query_mysql(
> "SELECT account from ACCOUNTS where ACCOUNTS.NAME=" + $name)
>
> Instead, that should be one query:
> select name, account from customers, accounts
> where customer.name=accounts.name
>
>


Yes, 100% agreed - this should be avoided.  The only way I have any of 
this at this point is by putting objects with other objects to make the 
objects easier to work with.  I know - there is a better way of doing 
this....


>> Does this keep from having this case?
>> PoleID1 PoleID2
>> 1 2
>> 2 1
>
> Well, technically not.
>
>> Does keying prevent this or does it have to be taken care of some
>> other way?
>
> To prevent this, you'd probably have to use a trigger. The "consistency"
> in a database is supposed to be done by having a trigger reject the
> insert if the poles in a different order are already there.
>
> That's why you have to take time to think about it: If you just type it
> in as fast as you can think, you wind up with something like this and
> then you're screwed. :-)
>

Yes, triggers, I remember those back from my SQL Server days.  They can 
do a lot for you - more than just keeping your data straight.  I can't 
wait to get back to those and be able to use them as a tool :-)

>
> If you can manage that, it makes things more clear.
>
> For one thing, for example, if you identify cables by (for example) the
> poles they're run between in the GUI, you don't have to then go look up
> the ID from that information in order to use it elsewhere. If you find
> that you're never actually referencing the ID in queries when you're not
> also referencing the pole and height, chances are that ID can be
> replaced by the pole and height.
>

yes, I do have to 'fetch' the ID from time to time.  I like not having 
to do that.  It's just getting things straight enough so that not using 
an ID doesn't mess me up too bad.  I don't have as much experience with 
setting up keys and making sure they don't mes things up.  Bu that is 
something that I am going to be working on :-)

>> I like the concept - make it robust enough that it can't get fouled up
>> - no matter what you try to do.
>
> That's the goal, yes. That's the idea behind the "C" part of ACID. When
> you have a monsterous database that has to live for decades, it's worth
> spending weeks or months figuring that bit out. How much it's worth to
> *you* depends on how easy it is to clean up mistakes. If each job
> essentially starts over, it might not be worth putting too much time in
> up front, if you can just (say) re-import the excel sheets for where the
> poles are on the third job if you find the second job messed stuff up
> too bad.
>

I think for the time being we will be morphing with each job until 
things get more stable.  Then we may start putting the data more together.

>
> Exactly. There really isn't an attachment without a midspan or a midspan
> without an attachment, nor is there an attachment on just one pole.
>

Well, there is - transformers and cables that are perpendicular to our 
direction of travel :-)


>> How would one go about this kind of 'building' and how would keys play
>> a role.
>
> I would say "color" is an attribute that deserves its own table.
> Assuming that all colors are equal and don't need any calculations and
> such, I'd store the colors in a table with
> ID int, ColorName varchar(20)
> or something like that.
>
> This lets you change the color names (boss wants them all in caps) or
> later add internationalized names (now you have ID, ColorName, Language
> in the table), etc. Plus, it's more space efficient to store an ID
> number than a color name everywhere. It also keeps people from
> misspelling a color and you wind up having both Green and Gren poles.
>

So on lists like this, an ID is a good idea - and you wouldn't use the 
ColorName as the pk/fk.




> When adding family support, I would then add a second table that holds
> the allowable families. If each color was in only one family, I would
> put the ID of the family in the colors table. Otherwise, I would have a
> third table with Color.ID and Family.ID.
>

yes, a table to group the items according to how they can be grouped (1 
2 many, many 2 many...)


> Generally, if you have the same thing repeated many times in a table's
> column from a small selection of values, but it's *not* a foreign key,
> you've probably done something wrong. You probably want a table with
> that set of values as its primary key.
>

OK, that helps.

One can take the ideas of using non ID columns for pks too far.  So, in 
the case of colors, go ahead and set up a pk/fk relationship even if it 
is the *only* thing in the second table.

> You can certainly start out with the color in its own column, then move
> things around later when you find it's problematic. You might have some
> processing time, taking the system offline for half an hour while you
> rearrange the tables and such, but I'm guessing (given you were using
> Access before) that this wouldn't be a problem every month or two. :-)
>

So, if we see it probably needing to be 'broken out' in the future, we 
should go ahead and break it out now.

> You can even start out with the color as text in the column, later add
> another table that has the colors as the primary key and add the FK
> constraint, and after that start putting the different translations or
> families or whatever on that second column. I.e., where I refered to the
> "color.ID" above, you could just use the text of the color name as the
> ID, then have the translation or the family or whatever keyed off of
> that. Only after space or speed becomes a problem do you really need to
> worry about making it into an integer ID or some such.
>

With what little you know about what we are trying to do, what approach 
would you do in this case?  Remember, you know you may want to do 
something with color families, but you don't know yet.


>> We won't get very big. We survey maybe 5000+- poles in a given year.
>
> You should have zero concern about performance. Even a pretty weak
> machine should be able to do something with everything in your entire
> database several times a minute, let alone individual small groups of
> stuff.
>

I agree, it seems that the only way we would need a beast of a machine 
is if we somehow make it very inefficient..... but that's a trade off - 
development time or hardware cost.....


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.