POV-Ray: Newsgroups: povray.off-topic: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions		Server Time 13 Jul 2025 06:34:11 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Stephen
Subject: Re: Database Questions
Date: 11 Mar 2011 05:16:59
Message: <4d79f69b@news.povray.org>

On 11/03/2011 9:26 AM, Invisible wrote:
> Which is what I said. GPS isn't very accurate.

When you consider that the satellites orbit about 12,550 miles above the 
earth and the clock in your Tom Tom is not very expensive. The accuracy 
is acceptable IMO.

-- 
Regards
     Stephen

Post a reply to this message

From: Invisible
Subject: Re: Database Questions
Date: 11 Mar 2011 05:25:35
Message: <4d79f89f@news.povray.org>

On 11/03/2011 10:16 AM, Stephen wrote:
> On 11/03/2011 9:26 AM, Invisible wrote:
>> Which is what I said. GPS isn't very accurate.
>
> When you consider that the satellites orbit about 12,550 miles above the
> earth

...then it's amazing that it can fix your location to the nearest 
landmass, never mind the nearest city. I agree. ;-)

Post a reply to this message

From: Stephen
Subject: Re: Database Questions
Date: 11 Mar 2011 05:42:26
Message: <4d79fc92$1@news.povray.org>

On 11/03/2011 10:25 AM, Invisible wrote:
> On 11/03/2011 10:16 AM, Stephen wrote:
>> On 11/03/2011 9:26 AM, Invisible wrote:
>>> Which is what I said. GPS isn't very accurate.
>>
>> When you consider that the satellites orbit about 12,550 miles above the
>> earth
>
> ....then it's amazing that it can fix your location to the nearest
> landmass, never mind the nearest city. I agree. ;-)

I have used my Tom Tom to navigate around Chester by foot. It saved 
memorising a page of the A to Z.

-- 
Regards
     Stephen

Post a reply to this message

From: scott
Subject: Re: Database Questions
Date: 11 Mar 2011 05:54:41
Message: <4d79ff71$1@news.povray.org>

>> BTW the ones built into the car are far superior
>
> And outside of everybody's price range.

Maybe 10 years ago, today navigation is usually an option even on most 
B-segment cars (small mini size cars).  I just checked the Renault 
website, even on the Clio you can get navigation built-in for £470.  I 
imagine in 10 years time pretty much every car will have it as standard 
(and obviously more and more 2nd hand cars will have navigation installed).

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 11 Mar 2011 11:57:53
Message: <4d7a5491$1@news.povray.org>

Invisible wrote:
> Which is what I said. GPS isn't very accurate.

It's not GPS that isn't accurate. It's the cheap GPS receiver in your car 
that isn't accurate.  GPS has a client and a server and they have to work 
together for the accuracy.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 11 Mar 2011 14:09:50
Message: <4d7a737e$1@news.povray.org>

On 3/10/2011 8:46 PM, Darren New wrote:
> Tom Austin wrote:
>>> If the primary key is the cash register number + timestamp to the
>
>> In the case of the receipts, what relationships do you create to have
>> a table reference a particular receipt?
>
> You just store the same columns.
>

Sounds like a lot of 'extra' data storage, but if a DB is fast anyway, 
then the gain of .01s on a query doesn't make a difference.

> Let's say you had a table that recorded customer service records about a
> particular receipt.
>
> create table receipts (
> register INT NOT NULL,
> issued TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
> ON UPDATE CURRENT_TIMESTAMP, -- MySql sucks
> amount decimal(10,2) NOT NULL,
> clerk INT NOT NULL,
> ....
> primary key (register, issued),
> FOREIGN KEY (clerk) references clerks(ID),
> ...);
>
> create table complaints (
> receipt_register INT NOT NULL,
> receipt_issued TIMESTAMP NOT NULL,
> csr_talking INT not null,
> complaint_answered TIMESTAMP NOT NULL,
> resolution blob,
> previous_call_csr INT,
> previous_answered timestamp,
> primary key (csr_talking, complaint_answered),
> foreign key (receipt_register, receipt_issed)
> references receipts(register, issued),
> foreign key (previous_call_csr, previous_answered)
> references complaints(csr_talking, complaint_answered)
> );
>
> So, modulo the fact that MySql dicks with timestamps in tables depending
> on how many timestamps are in the table, and disregarding the fact that
> MySql "NOT NULL" doesn't prevent you from inserting records with NULLs,
> we can look at this structure:
>

MySQL changes timestamps....  just precision or something else?


> A receipt is identified by the register number it was issued at and the
> time it was issued. There's also other details, like the clerk that was
> logged in, the amount of the sale, and so on.
>
> When the phone rings and someone complains, the CSR creates a new
> complaint record. This record is identified by the CSR taking the call
> and when they took the call. The complaint is about a particular
> receipt, identified by the register the receipt came from and the time
> that receipt was issued. This may be a follow-up call, in which case the
> CSR has the previous calls about this on the screen and may point out
> that this is a follow-up call from a particular previous complaint,
> identified by *that* csr at *that* time.
>
> Does that clarify?
>

Yes, I think I am seeing it clearly.  Just store the data as it is.  be 
smart about what is picked for the keys and use IDs sparingly.

IDs are great, but they can allow duplicates and potentially break your 
DB unless your application is smart enough to prevent the error - but 
wait, the goal of a good db is that it cannot be broken no matter what 
accesses it.

IDs also require that you get the ID so that you can join on it - even 
if you have the other data.  If you have the other data anyway, why go 
fetch an ID...

>>> Or not updating them at all, if you pass in the same data twice by
>>> mistake.
>>>
>>> Look at a CSV file full of receipts like above. Reading each row is
>>> going to clobber the record already there. If you use an auto-increment
>>> ID, you'll wind up with 200 reciepts if you read a 100-row CSV in twice.
>>>
>>
>> So, the primary key relationship keeps the data from being inserted by
>> default.
>
> Or at least you know when something goes wrong, instead of happily
> inserting two copies of every credit card, or pole, or whatever.
>

I like the idea of building a database that prevents errors in the first 
place.  Let's see if I can get it put into place here.  If so, I think 
it will happen over time just because of how fast people like to work here.

>
> Precisely. That's exactly the "C" part of the database. But if you *do*
> screw up, being able to go back and fix it and send out the next set of
> bills with a credit for the screw-up is helpful compared to going "I
> don't know, boss, all the balances are wrong and there's no way to know
> who we billed twice for last week."
>

We currently have some of this.  But it a balance - is our data good 
enough to get the client's requirements done.....  Most of the time yes.


> Your database sounds somewhat different, but to the extent you can
> actually figure out by looking at the database what happened to change
> it, the easier it's going to be to fix problems caused by going fast.
> Just something to keep in mind.
>

Yes, that's some of the push for simplifying it - but from a usability 
standpoint.  I think the person sees how much effort I have put into 
coding and how long some things take and wants to reduce that.  The main 
problem is that everything was done at lightening speed so it will suck 
by definition.

>
> Well, good luck with that. Some people can make that work, but I
> strongly suspect your boss is just a jerk. :-)
>

No, the boss isn't a jerk in this case.  We just have a bunch of young 
people who don't know how to slow down.  Out of 6-7 people, only 2 of us 
are over the age of 26.  If we are to keep the younger crowd, we have to 
cater somewhat to their needs/expectations - as long as it moves us in 
the right direction.


>>>> I think I have run into this already
>>>> select A from b where A.ID in ( select ID from C)
>>>
>>> No, that's cool. Well, not that exact query, but that sort of thing. The
>>> trick is it's all in SQL. You aren't looping inside the PHP code or
>>> Javascript code or whatever you're using to submit that code to the
>>> database engine.
>>>
>>
>> So what's not cool about the query - should something like that be
>> avoided?
>
> No, the query is cool. It's fine. (I mean, other than being
> syntactically wrong. :-) What you probably wanted to write is
>
> select A from B where B.ID in (select ID from C)
>
> but a better way of writing that is
>
> select A from B, C where B.ID = C.ID
>

Ok, so I typed it off the top of my head and referenced the wrong table :-)

But at least you knew what it was getting at.


> The nice thing about SQL is that both those statements will probably do
> exactly the same thing inside the server.
>
> The thing I'm trying to get you to avoid is putting a query inside a
> loop that doesn't involve user interaction. I.e., you shouldn't have a
> for loop or while loop in your code with a SQL select statement inside
> the body of the for loop unless there's other I/O to read non-database
> files or take clicks from the GUI or something. You should never need to
> do something like
>
> $first_result = query_mysql("SELECT name from CUSTOMERS")
> foreach ($name in $first_result)
> $second_result = query_mysql(
> "SELECT account from ACCOUNTS where ACCOUNTS.NAME=" + $name)
>
> Instead, that should be one query:
> select name, account from customers, accounts
> where customer.name=accounts.name
>
>


Yes, 100% agreed - this should be avoided.  The only way I have any of 
this at this point is by putting objects with other objects to make the 
objects easier to work with.  I know - there is a better way of doing 
this....


>> Does this keep from having this case?
>> PoleID1 PoleID2
>> 1 2
>> 2 1
>
> Well, technically not.
>
>> Does keying prevent this or does it have to be taken care of some
>> other way?
>
> To prevent this, you'd probably have to use a trigger. The "consistency"
> in a database is supposed to be done by having a trigger reject the
> insert if the poles in a different order are already there.
>
> That's why you have to take time to think about it: If you just type it
> in as fast as you can think, you wind up with something like this and
> then you're screwed. :-)
>

Yes, triggers, I remember those back from my SQL Server days.  They can 
do a lot for you - more than just keeping your data straight.  I can't 
wait to get back to those and be able to use them as a tool :-)

>
> If you can manage that, it makes things more clear.
>
> For one thing, for example, if you identify cables by (for example) the
> poles they're run between in the GUI, you don't have to then go look up
> the ID from that information in order to use it elsewhere. If you find
> that you're never actually referencing the ID in queries when you're not
> also referencing the pole and height, chances are that ID can be
> replaced by the pole and height.
>

yes, I do have to 'fetch' the ID from time to time.  I like not having 
to do that.  It's just getting things straight enough so that not using 
an ID doesn't mess me up too bad.  I don't have as much experience with 
setting up keys and making sure they don't mes things up.  Bu that is 
something that I am going to be working on :-)

>> I like the concept - make it robust enough that it can't get fouled up
>> - no matter what you try to do.
>
> That's the goal, yes. That's the idea behind the "C" part of ACID. When
> you have a monsterous database that has to live for decades, it's worth
> spending weeks or months figuring that bit out. How much it's worth to
> *you* depends on how easy it is to clean up mistakes. If each job
> essentially starts over, it might not be worth putting too much time in
> up front, if you can just (say) re-import the excel sheets for where the
> poles are on the third job if you find the second job messed stuff up
> too bad.
>

I think for the time being we will be morphing with each job until 
things get more stable.  Then we may start putting the data more together.

>
> Exactly. There really isn't an attachment without a midspan or a midspan
> without an attachment, nor is there an attachment on just one pole.
>

Well, there is - transformers and cables that are perpendicular to our 
direction of travel :-)


>> How would one go about this kind of 'building' and how would keys play
>> a role.
>
> I would say "color" is an attribute that deserves its own table.
> Assuming that all colors are equal and don't need any calculations and
> such, I'd store the colors in a table with
> ID int, ColorName varchar(20)
> or something like that.
>
> This lets you change the color names (boss wants them all in caps) or
> later add internationalized names (now you have ID, ColorName, Language
> in the table), etc. Plus, it's more space efficient to store an ID
> number than a color name everywhere. It also keeps people from
> misspelling a color and you wind up having both Green and Gren poles.
>

So on lists like this, an ID is a good idea - and you wouldn't use the 
ColorName as the pk/fk.




> When adding family support, I would then add a second table that holds
> the allowable families. If each color was in only one family, I would
> put the ID of the family in the colors table. Otherwise, I would have a
> third table with Color.ID and Family.ID.
>

yes, a table to group the items according to how they can be grouped (1 
2 many, many 2 many...)


> Generally, if you have the same thing repeated many times in a table's
> column from a small selection of values, but it's *not* a foreign key,
> you've probably done something wrong. You probably want a table with
> that set of values as its primary key.
>

OK, that helps.

One can take the ideas of using non ID columns for pks too far.  So, in 
the case of colors, go ahead and set up a pk/fk relationship even if it 
is the *only* thing in the second table.

> You can certainly start out with the color in its own column, then move
> things around later when you find it's problematic. You might have some
> processing time, taking the system offline for half an hour while you
> rearrange the tables and such, but I'm guessing (given you were using
> Access before) that this wouldn't be a problem every month or two. :-)
>

So, if we see it probably needing to be 'broken out' in the future, we 
should go ahead and break it out now.

> You can even start out with the color as text in the column, later add
> another table that has the colors as the primary key and add the FK
> constraint, and after that start putting the different translations or
> families or whatever on that second column. I.e., where I refered to the
> "color.ID" above, you could just use the text of the color name as the
> ID, then have the translation or the family or whatever keyed off of
> that. Only after space or speed becomes a problem do you really need to
> worry about making it into an integer ID or some such.
>

With what little you know about what we are trying to do, what approach 
would you do in this case?  Remember, you know you may want to do 
something with color families, but you don't know yet.


>> We won't get very big. We survey maybe 5000+- poles in a given year.
>
> You should have zero concern about performance. Even a pretty weak
> machine should be able to do something with everything in your entire
> database several times a minute, let alone individual small groups of
> stuff.
>

I agree, it seems that the only way we would need a beast of a machine 
is if we somehow make it very inefficient..... but that's a trade off - 
development time or hardware cost.....

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 11 Mar 2011 18:05:42
Message: <4d7aaac6$1@news.povray.org>

Tom Austin wrote:
> Sounds like a lot of 'extra' data storage, but if a DB is fast anyway, 
> then the gain of .01s on a query doesn't make a difference.

It can be. Like all things, you have to make the trade-off between 
readability, reliability, size, and speed.

> MySQL changes timestamps....  just precision or something else?

MySQL basically assumes the first timestamp is the time you last changed the 
record, and the second was the time you created the record, or something 
stupid like that. You need to read the timestamp documentation.

Basically, MySQL will change your data without telling you, depending on 
what order you declared the columns in. Yes, MySQL sucks.

> Yes, I think I am seeing it clearly.  Just store the data as it is.  be 
> smart about what is picked for the keys and use IDs sparingly.

Yep.

> IDs are great, but they can allow duplicates and potentially break your 
> DB unless your application is smart enough to prevent the error - but 
> wait, the goal of a good db is that it cannot be broken no matter what 
> accesses it.

Yep.

> IDs also require that you get the ID so that you can join on it - even 
> if you have the other data.  If you have the other data anyway, why go 
> fetch an ID...

Right. Altho the join doesn't have to use the ID.

select * from receipts,sales where receipt.time=sales.time and 
receipt.register=sales.register

works even if receipts and sales both have ID columns.

> I like the idea of building a database that prevents errors in the first 
> place.  Let's see if I can get it put into place here.  If so, I think 
> it will happen over time just because of how fast people like to work here.

Good job. :-)

> We currently have some of this.  But it a balance - is our data good 
> enough to get the client's requirements done.....  Most of the time yes.

Indeed, it's always a balance. That's something a lot of people don't manage 
to recognise for some reason.

> Yes, that's some of the push for simplifying it - but from a usability 
> standpoint.  I think the person sees how much effort I have put into 
> coding and how long some things take and wants to reduce that.  The main 
> problem is that everything was done at lightening speed so it will suck 
> by definition.

Time. Cost. Quality.  Pick any two.

>> Well, good luck with that. Some people can make that work, but I
>> strongly suspect your boss is just a jerk. :-)
>>
> 
> No, the boss isn't a jerk in this case.  We just have a bunch of young 
> people who don't know how to slow down.  Out of 6-7 people, only 2 of us 
> are over the age of 26.  If we are to keep the younger crowd, we have to 
> cater somewhat to their needs/expectations - as long as it moves us in 
> the right direction.

Fair enough. Get input from them, teach them why you're doing it the way you 
are, and leave places in the system where they can do things the way they 
like (say, the GUI) while keeping the database clean.

> Ok, so I typed it off the top of my head and referenced the wrong table :-)
> But at least you knew what it was getting at.

Yeah.

> Yes, 100% agreed - this should be avoided.  The only way I have any of 
> this at this point is by putting objects with other objects to make the 
> objects easier to work with.  I know - there is a better way of doing 
> this....

Objects inside objects isn't a problem. SQL doesn't really do objects well. 
The trick is to encapsulate that in a business layer. The bottom end of the 
business layer is the database table definitions, and the top end is the 
conceptual things your business thinks about.

> Yes, triggers, I remember those back from my SQL Server days.  They can 
> do a lot for you - more than just keeping your data straight.  I can't 
> wait to get back to those and be able to use them as a tool :-)

I don't think MySQL has triggers.  Heck, you have to go out of your way to 
get MySQL to enforce foreign key constraints, let along triggers or views.

> yes, I do have to 'fetch' the ID from time to time.  I like not having 
> to do that.  It's just getting things straight enough so that not using 
> an ID doesn't mess me up too bad.  I don't have as much experience with 
> setting up keys and making sure they don't mes things up.  Bu that is 
> something that I am going to be working on :-)

Cool. And again, it's mostly just a personal preference of mine borne out by 
experience. It's rejecting the concept that every table, by default, must 
have and be referenced by an arbitrary integer key.  It's not saying they're 
bad, but just "think about it for more than 2 seconds."

> I think for the time being we will be morphing with each job until 
> things get more stable.  Then we may start putting the data more together.

Yeah, keep it agile. Keep every version of your schema around so you can 
tell what you changed.

> Well, there is - transformers and cables that are perpendicular to our 
> direction of travel :-)

Oh, fair enough. But you see what I'm trying to say.  If, logically, you 
can't have one without the other, they need to both be in the same row.

> So on lists like this, an ID is a good idea - and you wouldn't use the 
> ColorName as the pk/fk.

Not necessarily. Either an integer or the color name itself would work. This 
table is likely not to change often enough to be at risk of being screwed 
up, and honestly color names are already pretty arbitrary.

If you had a table of color RGB values, that would be different. Then you're 
talking about something "real". But the names themselves are already just 
symbols for the actual color, and one color can have multiple names etc.

> One can take the ideas of using non ID columns for pks too far.  So, in 
> the case of colors, go ahead and set up a pk/fk relationship even if it 
> is the *only* thing in the second table.

Yes. The point there is to enforce that all the colors are selected from a 
fixed list of possible colors. The text isn't really text, it's an 
identifier for a color in textual form. If you store it multiple times in 
textual form, it gives an opportunity to accidentally get a piece of text 
that isn't a color identifier into the color column.

Now, if you wanted to map colors to unique-names-of-colors (for PANTONE or 
something, say), I'd say make the RGB of the color be the PK.

>> You can certainly start out with the color in its own column, then move
>> things around later when you find it's problematic. You might have some
>> processing time, taking the system offline for half an hour while you
>> rearrange the tables and such, but I'm guessing (given you were using
>> Access before) that this wouldn't be a problem every month or two. :-)
>>
> 
> So, if we see it probably needing to be 'broken out' in the future, we 
> should go ahead and break it out now.

I would write code that accesses the database while being aware that it 
might need to be broken out. I wouldn't necessarily break it out now. You 
can break it out later, but only if you don't have a big rewrite to break it 
out.

Figuring out where the fracture lines lie is experience.

>> You can even start out with the color as text in the column, later add
>> another table that has the colors as the primary key and add the FK
>> constraint, and after that start putting the different translations or
>> families or whatever on that second column. I.e., where I refered to the
>> "color.ID" above, you could just use the text of the color name as the
>> ID, then have the translation or the family or whatever keyed off of
>> that. Only after space or speed becomes a problem do you really need to
>> worry about making it into an integer ID or some such.
>>
> 
> With what little you know about what we are trying to do, what approach 
> would you do in this case?  Remember, you know you may want to do 
> something with color families, but you don't know yet.

If you have a fixed number of colors (i.e., if you're talking about using 
color as a label rather than an *actual* color) then I would put the colors 
in a table with an ID and a color label and refer to the ID and not worry 
about the families and such.

I.e., if you're talking about "this is a green tag pole, that is an orange 
cable, this is a blue-white terminal" then the colors are tags. If you care 
about which shade of orange it is because you're mixing matching paint or 
something, that's not using the color as a label so I'd have a different answer.

So I would start with the colors in a table with
   ID int, ColorName varchar(50), primary key ID, unique index ColorName.

Use the ID to refer to colors, and you'll have the most flexibility in the 
future. Don't store IDs outside the database, tho - always translate them to 
color names before sticking them in external files or hard-coding them into 
a UI or something.

> I agree, it seems that the only way we would need a beast of a machine 
> is if we somehow make it very inefficient..... but that's a trade off - 
> development time or hardware cost.....

Yeah, where a "beast" machine is $1000. :-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 14 Mar 2011 09:46:23
Message: <4d7e1c2f$1@news.povray.org>

On 3/11/2011 6:05 PM, Darren New wrote:
>
> Right. Altho the join doesn't have to use the ID.
>
> select * from receipts,sales where receipt.time=sales.time and
> receipt.register=sales.register
>
> works even if receipts and sales both have ID columns.
>

Yes, I gathered that ;-)
Will make any transition easier as ID's don't have to be removed cold 
turkey.

>
>> We currently have some of this. But it a balance - is our data good
>> enough to get the client's requirements done..... Most of the time yes.
>
> Indeed, it's always a balance. That's something a lot of people don't
> manage to recognise for some reason.
>

yes, tho even when recognized people may have different notions of what 
it should be.

>
> Time. Cost. Quality. Pick any two.
>

That's our motto to our clients who want all three.

>
>> Yes, 100% agreed - this should be avoided. The only way I have any of
>> this at this point is by putting objects with other objects to make
>> the objects easier to work with. I know - there is a better way of
>> doing this....
>
> Objects inside objects isn't a problem. SQL doesn't really do objects
> well. The trick is to encapsulate that in a business layer. The bottom
> end of the business layer is the database table definitions, and the top
> end is the conceptual things your business thinks about.
>

Yep, SQL does tables well - if you want objects it has to come from the 
outside.

>> Yes, triggers, I remember those back from my SQL Server days. They can
>> do a lot for you - more than just keeping your data straight. I can't
>> wait to get back to those and be able to use them as a tool :-)
>
> I don't think MySQL has triggers. Heck, you have to go out of your way
> to get MySQL to enforce foreign key constraints, let along triggers or
> views.
>

I've been following it a bit over the years - at least some table 
formats in MySQL support triggers now.

> Yes. The point there is to enforce that all the colors are selected from
> a fixed list of possible colors. The text isn't really text, it's an
> identifier for a color in textual form. If you store it multiple times
> in textual form, it gives an opportunity to accidentally get a piece of
> text that isn't a color identifier into the color column.
>

I think I am grasping the concept.  If you need a 'pick list' you should 
have a table with the picklist values.  Storing an ID or piece of text 
doesn't matter as long as you have pk/fk set up correctly to prevent 
data corruption.

>
> If you have a fixed number of colors (i.e., if you're talking about
> using color as a label rather than an *actual* color) then I would put
> the colors in a table with an ID and a color label and refer to the ID
> and not worry about the families and such.
>

That's how I would approach it myself - I just wanted to hear how 
someone else would approach it.  Thanks for helping to verify my gut 
feelings.

> Use the ID to refer to colors, and you'll have the most flexibility in
> the future. Don't store IDs outside the database, tho - always translate
> them to color names before sticking them in external files or
> hard-coding them into a UI or something.
>

Yes, that would not be smart - ID's have no meaning outside the DB.

Thanks for all your input.  You have helped to confirm that my gut 
feelings on how to set up a DB are mostly in the right directions.

2 big points I take from this:

Be willing to explore better PK/FK uses.
I have always known that a PK doesn't have to be a number, but it always 
felt foreign to use text or some other number as they don't feel as 
'precise' as an integer.  I guess I've run into too many cases of trying 
to match floats or case sensitive text to let it feel natural.

Design the DB purely from 'what is out there'.  Don't take use into 
account (or at least minimize it).

Now for a good DB engine...  I'll start a new thread for that :-)

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 14 Mar 2011 12:54:16
Message: <4d7e4838@news.povray.org>

Tom Austin wrote:
> I think I am grasping the concept.  If you need a 'pick list' you should 
> have a table with the picklist values.  Storing an ID or piece of text 
> doesn't matter as long as you have pk/fk set up correctly to prevent 
> data corruption.

Exactly. And it gives you a hook to expand on in the future. You have a 
place now to talk about colors divorced from where you reference them.

> Thanks for all your input.  You have helped to confirm that my gut 
> feelings on how to set up a DB are mostly in the right directions.

No problem. It helped me clarify how to word explanations too.

> 2 big points I take from this:
> 
> Be willing to explore better PK/FK uses.
> I have always known that a PK doesn't have to be a number, but it always 
> felt foreign to use text or some other number as they don't feel as 
> 'precise' as an integer.  I guess I've run into too many cases of trying 
> to match floats or case sensitive text to let it feel natural.

Yeah, exactly. You still have to be careful.

The other thing is, if upper/lower case makes a difference, or you're using 
a float, those are probably not things distinct enough for a real-world 
"unique identifier" anyway.  If you have trouble due to possible confusions 
with representation or precision, it's likely not a unique attribute of the 
object anyway.  I.e., if upper/lower case can be a confusion because it's 
(say) a human's name, well the name isn't a unique identifier anyway. If 
you're measuring the height of something and you don't know the precision to 
which you have to round it, chances are it's not a unique height anyway.

To use something like an email address, you have to decide how you're going 
to normalize it. Even tho RFC822 allows case-specific email addresses, in 
practice everybody lower-cases it. If you're talking about GPS coordinates 
and you don't have a significant minimum distance between objects, the GPS 
coordinates aren't going to be unique anyway.  E.g., telephone poles? Sure, 
the diameter of a phone pole is orders of magnitude smaller than the 
distance between them.  GPS locations of mail boxes? Probably not what you'd 
call unique.

And using a third-party identifier works well too. House addresses, zip 
codes, airport codes, phone numbers (plus timestamp of course), etc. If you 
can push the authoritative decision onto some other public organization, 
you're in good shape.

> Design the DB purely from 'what is out there'.  Don't take use into 
> account (or at least minimize it).

Yes. This is one of the really important ones that the self-educated guys 
tend to mess up.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 14 Mar 2011 13:20:13
Message: <4d7e4e4d$1@news.povray.org>

On 3/14/2011 12:54 PM, Darren New wrote:
> Tom Austin wrote:
>> I think I am grasping the concept. If you need a 'pick list' you
>> should have a table with the picklist values. Storing an ID or piece
>> of text doesn't matter as long as you have pk/fk set up correctly to
>> prevent data corruption.
>
> Exactly. And it gives you a hook to expand on in the future. You have a
> place now to talk about colors divorced from where you reference them.
>

Now I have to get that through the other guys here <sigh>


>
> Yeah, exactly. You still have to be careful.
>

you can say that again

> And using a third-party identifier works well too. House addresses, zip
> codes, airport codes, phone numbers (plus timestamp of course), etc. If
> you can push the authoritative decision onto some other public
> organization, you're in good shape.
>

All good food to chew on.


>> Design the DB purely from 'what is out there'. Don't take use into
>> account (or at least minimize it).
>
> Yes. This is one of the really important ones that the self-educated
> guys tend to mess up.
>

I have no idea what you could be talking about ;-)

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages