POV-Ray : Newsgroups : povray.off-topic : Database Questions : Re: Database Questions Server Time
3 Sep 2024 19:13:55 EDT (-0400)
  Re: Database Questions  
From: Tom Austin
Date: 10 Mar 2011 18:20:20
Message: <4d795cb4$1@news.povray.org>
On 3/9/2011 2:57 PM, Darren New wrote:
> Tom Austin wrote:
>> I see - the primary key thing should prevent the 'error'.
>
> Yeah. OK, imagine a database of oh receipts, say.
>
> If the primary key is some arbitrary integer, it's possible to get the
> same receipt into the table more than once, which will screw up your
> accounting.
>
> If the primary key is the cash register number + timestamp to the
> second, the only way you get a collision is to make two sales on the
> same register within one second of each other, which probably *should*
> throw some sort of error. It also makes it easy to print on the paper
> receipt all you need to know to get the exact record.
>
> Whether you can do this sort of thing with any particular table of yours
> is another question.
>

This makes sense.

I think what still boggles my mind is how to do a double primary key / 
double foreign key relationship.

In the case of the receipts, what relationships do you create to have a 
table reference a particular receipt?

>
> Or not updating them at all, if you pass in the same data twice by mistake.
>
> Look at a CSV file full of receipts like above. Reading each row is
> going to clobber the record already there. If you use an auto-increment
> ID, you'll wind up with 200 reciepts if you read a 100-row CSV in twice.
>

So, the primary key relationship keeps the data from being inserted by 
default.


> IME, a large part of database fuckage is either operator error (like
> loading stuff twice) or simple programming errors (like debiting
> everyone's account instead of just the account of the person you should
> have debited). The idea of making idempotent updates and keeping
> write-once historical records came about because that makes these things
> easy to fix.
>

And the trick is to have the database strong enough to prevent such 
errors from causing trouble in the first place.  You spend your time 
setting it up right or making fixes for problems that crop up.



>
>> pretty much - the owner didn't want to invest the time to make it
>> better and gain the rewards. new owner - new ideas
>
> Cool. Let's hope it stays that way. :-)
>

I think it will - but the new push is to be very dynamic - so dynamic 
that taking the time to set up a proper database is not possible.  From 
one extreme to another.  I guess I rather take this extreme - it means 
we are moving someplace.

>> I think I have run into this already
>> select A from b where A.ID in ( select ID from C)
>
> No, that's cool. Well, not that exact query, but that sort of thing. The
> trick is it's all in SQL. You aren't looping inside the PHP code or
> Javascript code or whatever you're using to submit that code to the
> database engine.
>

So what's not cool about the query - should something like that be avoided?


>> in its simplest form leaving out fluff like owners and such:
>>
>> table poles: (each pole)
>> ID, GPS Location
>> 1 entry for each pole
>>
>> table connections: (how poles are connected)
>> ID, PoleID1, PoleID2, Distance
>> 1 entry for each pole-pole joining (1 for each 'set' of cables)
>
> See, this is what I mean. There's no need for an ID on this table. You
> don't want two rows, one where it's
>
> (27, Pole 3, Pole 4, 500 feet)
> (29, Pole 3, Pole 4, 800 feet)
>
> You get that, your data is screwed, and you know it, but you can't fix it.
>
> A connection between two poles is defined by the two poles it's
> connecting. The distance and what types of cables go through are
> dependent data.
>
> table connections:
> PoleID1, PoleID2, Distance (pk=PoleID1+PokeID2)
>
> table cables:
> PoleID1, PoleID2, height, cable type. (pk=PoleID1+PoleID2+cable type)
>

Does this keep from having this case?
PoleID1	PoleID2
1   	2
2   	1

Does keying prevent this or does it have to be taken care of some other way?



>> table attachments: (cable attached to poles & height)
>> ID, PoleID, Height
>> multiple entries per pole
>> 1 for each attachment
>
> Is there a row in this table for a pole that has no cables attached?
> I.e., does this represent "Cables attached to this pole" or "places
> where it's posslbe to attach a cable"?
>

actual cables attached - so a pole can have 0 attachments.


>> table midspans: (lowest height between poles for each cable)
>> ID, AttachmentID1, AttachmentID2, Height, ConnectionID(redundant)
>> multiple midspans for each 'connection'
>> 1 midspan per cable over the connection
>> essentially 1 midspan for each connection on a pair of poles
>
>
> Consider a slightly different layout: a span.
>
> Cable type, poleID1, attachmenthieght1, poleID2, attachmentheight2,
> midspanheight.
>
> Then poleid1+poleid2+cabletype becomes your primary key.
>

Ok, I was confused but now I think I see clearly.  Instead of 
dereferencing to attachmentID like I did, one just uses the (poleID) and 
(attachment height) as the key from the attachments table.


> That gives you poles, distances between poles, and spans of cable. If
> you need to know connectivity, you could have a table that lists the
> first pole and last pole along with a collection of spans.
>
> I'm not saying you should do it this way. I'm just pointing out there
> are lots of ways to organize the data, and you might want to think of a
> drastic simplification in the structure that will still give you
> everything you need with a bit of SQL processing time.
>

No, but it gives a lot of food for thought.

I don't think you have proposed anything much different than what we 
have - it just rids the tables if ID's that can cause more trouble than 
they are worth.


> What you want is a structure where it becomes impossible to have only
> some of the information you need. Just like you couldn't have a GPS
> location without a pole, you don't want to have (say) a connection
> height on a pole without a mispan, or a connection height on one pole
> but not on the other pole.
>

yes, and this is done in the schema without the need for any real code 
at this point.

I like the concept - make it robust enough that it can't get fouled up - 
no matter what you try to do.

I don't know if we will get there, but I know we will make a few steps 
in that direction.

>> The rest is pretty much straight forward - objects and properties
>> associated with each of the above.
>>
>> As you can see, there are some circular references going on
>> (attachments and midspans connect poles
>
> Yeah. That means you're doing it wrong, if it's actually *circular*.
>
> It's often easier to draw stuff as pictures on paper when you're
> designing things.
>
> Draw a box for each table, and an arrow each time there's a foreign key.
>
> Any box with no outgoing arrows is a fundamental real-world entity. In
> this case, for example, the poles better be in the real world, because
> they're not attached to anything else.
>

Now that I think about it more, I don't think I have actual circular 
going on.

I do have the case where I get connections from the connection table 
(pole1, pole2, distance) and I can also get the same information from 
the attachments table (poleID, height) and midspans( attachment1, 
attachment2) table.

The touble arises if the attachments/midspan tables create a connection 
that is not defined int he connections table.


>> You have presented some new ideas that 'extend' my thinking and
>> stretch it. I do not know how much will make it in to the system, but
>> it is good food for thought.
>
> It takes practice. I'm just trying to explain how I initially approach
> things, and why. Of course each situation varies.
>

I have the knowledge on how to make a database work - even properly in 
the sense that it is clean.  but the other 10% is the icing that makes 
things really work well and keep working well with less effort.


>> I agree with it to a point - mainly for ease of working with the data,
>> but not at the sacrifice of usability of the data. The drive is so
>> hard that there is talk about flattening the database to make it
>> 'simpler' and that adding more tables makes it more complex.
>
> Add lots of tables. Create "simpler" tables by creating views. When you
> find in six months that you need new functionality, create other views
> that expose the data organized in the new way.
>

OK, a question

I have a table with a descriptor - let's say pole color.

Right now it is just that - a descriptor, but is to be presented to the 
user as a list of distinct items to choose from.

I can vaguely see in the future that I will want to group the color into 
families (maybe reds, blues, greens), but because of time I can't define 
what they are so I will ignore them for now.

At some point in the future the need comes up to define the families and 
add them to the DB.


How would one go about this kind of 'building' and how would keys play a 
role.

Would I simply specify the column as 'text' and just put the color into 
it for now.  Then pull from a distinct query to get the list of 
available colors.  Then in the future make a new table that uses this 
column as a foreign key that gives the families.  but then the distinct 
query is no longer needed as the new table has the distinct values.

Or should I go ahead and create the additional table so that it can 
provide the colors with a 1 stop place to add a new one?  Then as the 
families are needed just add them to the table?


I know this might not be 100% clear, but how would you go about this?

>> Additionally some of our issue is that when we finish a job for a
>> client we do not need to reference the data any more. Rarely have we
>> found ourselves going back to old data. How much effort should be put
>> into a system that supports saving everything and making it accessible
>> when one does not refer back to it?
>
> Another good reason to avoid using auto-increment IDs. :-) You can have
> a "job" table that gives you top-level pointers for a job to each thing
> involved in a job. (I.e., if a Job consisted of a bunch of connections
> on a bunch of poles, you'd have a Job ID and a collection of poles. The
> collection of poles would give you the collection of cables. Etc.) You
> could then write out (as SQL even) all the rows associated with a given
> job, then delete those rows. If you ever needed them back, run that SQL
> back into the database. (Altho, honestly, most database engines with
> autoincrement IDs can handle actually inserting records with a given ID
> and not generating a new one.)
>

yes, I have seen this with Access - updates fail, but inserts work.


> But honestly, I don't know how big your database is, but a database can
> store millions of rows before you even start to see anything getting
> slow. It's not a case of "do we need to keep it around" as it is "do we
> need to throw it away?" But if you manage to build the data in a way
> that each chunk you might want to archive (i.e., a job) has a record
> that lets you get the whole collection of everything associated with
> that record, then you don't have to worry about the actual
> archive-and-delete part of the code until there's a reason to. Then you
> can write that code, confident that you know what you need to store out.
>

We won't get very big.  We survey maybe 5000+- poles in a given year.

>> Thanks for your feedback - it is very helpful.
>
> I like helping,b ecause writing the thoughts out clearly enough to
> explain to someone else always clarifies things to me too.
>

Yes, I like helping others as well for much the same reason.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.