POV-Ray : Newsgroups : povray.off-topic : Database Questions : Re: Database Questions Server Time
3 Sep 2024 21:15:08 EDT (-0400)
  Re: Database Questions  
From: Darren New
Date: 9 Mar 2011 14:57:42
Message: <4d77dbb6@news.povray.org>
Tom Austin wrote:
> I see - the primary key thing should prevent the 'error'.

Yeah. OK, imagine a database of oh receipts, say.

If the primary key is some arbitrary integer, it's possible to get the same 
receipt into the table more than once, which will screw up your accounting.

If the primary key is the cash register number + timestamp to the second, 
the only way you get a collision is to make two sales on the same register 
within one second of each other, which probably *should* throw some sort of 
error. It also makes it easy to print on the paper receipt all you need to 
know to get the exact record.

Whether you can do this sort of thing with any particular table of yours is 
another question.

> When you state multiple times, you are implying that multiple records 
> are NOT being created - just 'updating' existing records after the first 
> read.

Or not updating them at all, if you pass in the same data twice by mistake.

Look at a CSV file full of receipts like above. Reading each row is going to 
clobber the record already there. If you use an auto-increment ID, you'll 
wind up with 200 reciepts if you read a 100-row CSV in twice.

IME, a large part of database fuckage is either operator error (like loading 
stuff twice) or simple programming errors (like debiting everyone's account 
instead of just the account of the person you should have debited). The idea 
of making idempotent updates and keeping write-once historical records came 
about because that makes these things easy to fix.


> pretty much - the owner didn't want to invest the time to make it better 
> and gain the rewards.  new owner - new ideas

Cool. Let's hope it stays that way. :-)

> I think I have run into this already
> select A from b where A.ID in ( select ID from C)

No, that's cool. Well, not that exact query, but that sort of thing. The 
trick is it's all in SQL. You aren't looping inside the PHP code or 
Javascript code or whatever you're using to submit that code to the database 
engine.

Since it's all in SQL, the SQL engine can optimize that.

> we got pretty complicated with poles, connections between poles, cables 
> on poles and cables between poles....  a little intertwined

Oh, I can imagine. :-) But there are actually formal mathematical processes 
and rules for designing a database like that in a way that makes it easier.

> in its simplest form leaving out fluff like owners and such:
> 
> table poles:  (each pole)
> ID, GPS Location
> 1 entry for each pole
> 
> table connections:  (how poles are connected)
> ID, PoleID1, PoleID2, Distance
> 1 entry for each pole-pole joining (1 for each 'set' of cables)

See, this is what I mean. There's no need for an ID on this table. You don't 
want two rows, one where it's

(27, Pole 3, Pole 4, 500 feet)
(29, Pole 3, Pole 4, 800 feet)

You get that, your data is screwed, and you know it, but you can't fix it.

A connection between two poles is defined by the two poles it's connecting. 
The distance and what types of cables go through are dependent data.

table connections:
PoleID1, PoleID2, Distance (pk=PoleID1+PokeID2)

table cables:
PoleID1, PoleID2, height, cable type. (pk=PoleID1+PoleID2+cable type)

> table attachments:  (cable attached to poles & height)
> ID, PoleID, Height
> multiple entries per pole
> 1 for each attachment

Is there a row in this table for a pole that has no cables attached? I.e., 
does this represent "Cables attached to this pole" or "places where it's 
posslbe to attach a cable"?

> table midspans: (lowest height between poles for each cable)
> ID, AttachmentID1, AttachmentID2, Height, ConnectionID(redundant)
> multiple midspans for each 'connection'
> 1 midspan per cable over the connection
> essentially 1 midspan for each connection on a pair of poles


Consider a slightly different layout: a span.

Cable type, poleID1, attachmenthieght1, poleID2, attachmentheight2, 
midspanheight.

Then poleid1+poleid2+cabletype becomes your primary key.

That gives you poles, distances between poles, and spans of cable. If you 
need to know connectivity, you could have a table that lists the first pole 
and last pole along with a collection of spans.

I'm not saying you should do it this way. I'm just pointing out there are 
lots of ways to organize the data, and you might want to think of a drastic 
simplification in the structure that will still give you everything you need 
with a bit of SQL processing time.

What you want is a structure where it becomes impossible to have only some 
of the information you need. Just like you couldn't have a GPS location 
without a pole, you don't want to have (say) a connection height on a pole 
without a mispan, or a connection height on one pole but not on the other pole.

> The rest is pretty much straight forward - objects and properties 
> associated with each of the above.
> 
> As you can see, there are some circular references going on (attachments 
> and midspans connect poles

Yeah. That means you're doing it wrong, if it's actually *circular*.

It's often easier to draw stuff as pictures on paper when you're designing 
things.

Draw a box for each table, and an arrow each time there's a foreign key.

Any box with no outgoing arrows is a fundamental real-world entity. In this 
case, for example, the poles better be in the real world, because they're 
not attached to anything else.

> You have presented some new ideas that 'extend' my thinking and stretch 
> it.  I do not know how much will make it in to the system, but it is 
> good food for thought.

It takes practice. I'm just trying to explain how I initially approach 
things, and why. Of course each situation varies.

> We have some views in our group that state that the use of the data 
> should drive how it is stored. 

This is exactly the *opposite* of what an RDBMS is all about. People already 
had this kind of database, and the problem was that the data became unusable 
when new requirements came up. That's why SQL has no pointers.

> I agree with it to a point - mainly for 
> ease of working with the data, but not at the sacrifice of usability of 
> the data.  The drive is so hard that there is talk about flattening the 
> database to make it 'simpler' and that adding more tables makes it more 
> complex.

Add lots of tables. Create "simpler" tables by creating views. When you find 
in six months that you need new functionality, create other views that 
expose the data organized in the new way.

> Additionally some of our issue is that when we finish a job for a client 
> we do not need to reference the data any more.  Rarely have we found 
> ourselves going back to old data.  How much effort should be put into a 
> system that supports saving everything and making it accessible when one 
> does not refer back to it?

Another good reason to avoid using auto-increment IDs. :-) You can have a 
"job" table that gives you top-level pointers for a job to each thing 
involved in a job. (I.e., if a Job consisted of a bunch of connections on a 
bunch of poles, you'd have a Job ID and a collection of poles. The 
collection of poles would give you the collection of cables. Etc.) You could 
then write out (as SQL even) all the rows associated with a given job, then 
delete those rows. If you ever needed them back, run that SQL back into the 
database.  (Altho, honestly, most database engines with autoincrement IDs 
can handle actually inserting records with a given ID and not generating a 
new one.)

But honestly, I don't know how big your database is, but a database can 
store millions of rows before you even start to see anything getting slow. 
It's not a case of "do we need to keep it around" as it is "do we need to 
throw it away?"  But if you manage to build the data in a way that each 
chunk you might want to archive (i.e., a job) has a record that lets you get 
the whole collection of everything associated with that record, then you 
don't have to worry about the actual archive-and-delete part of the code 
until there's a reason to. Then you can write that code, confident that you 
know what you need to store out.

> Thanks for your feedback - it is very helpful.

I like helping,b ecause writing the thoughts out clearly enough to explain 
to someone else always clarifies things to me too.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.