POV-Ray : Newsgroups : povray.off-topic : Database Questions : Re: Database Questions Server Time
3 Sep 2024 17:19:40 EDT (-0400)
  Re: Database Questions  
From: Darren New
Date: 8 Mar 2011 17:11:33
Message: <4d76a995$1@news.povray.org>
Tom Austin wrote:
> Interesting way of looking at building a schema - just break it down to 
> objects.  I think this fits our data very well.  I think we were doing 
> this anyway, but not on purpose - more by accident.

Trust me on this: It's the only way you ever get a schema you don't have to 
change with every software update. Record what is "out there", and to the 
extent sensible, never overwrite data. If you find yourself thinking "We'll 
need to keep copies", you've done something wrong, because what that really 
means is you need the state of something at two different times, but you 
didn't record the times.  Record *everything*, in as raw a format as you can.

I am looking at a financial system right now where I never overwrite data. 
When someone swipes a card, I put into a table exactly what the card machine 
gave me: The card machine ID, the credit card number, the amount the clerk 
typed in, and the timestamp I got the record.  I don't record the merchant 
ID in that table, or the ID of the cardholder. That all comes later, when I 
process the records.

Why? Because the merchant might change merchant accounts before telling us. 
The cardholder might lose the card. Heck, we might even accidentally create 
two cards with the same number.

I'm blathering. Nevermind.

Anyway, yes, record everything from *outside* your company in one set of 
tables, as *raw* as you can. Changes to these tables should be idempotent. 
(I.e., you *should* be able to read the CSV file the telco gave you listing 
the pole information multiple times into the same table without corruption. 
This implies no automatically-generated primary key on those tables. That 
said, again, this is just a guideline and not always possible.)

Have a second set of tables, completely created from the first set, that 
holds any calculated information that might be difficult to recalculate 
(e.g., running account balances in a financial app), but which can be 
checked against and recreated from the raw data.

Once you have stuff working, figure out where it's slow, *then* fix that. 
Usually "fix that" involves changing the declarations of some indexes, not 
rearranging the actual data (which, if you have data for which you can 
follow the rules above, you can't really rearrange anyway).

> this might be worth considering, but I don't know how easy it will be to 
> implement and keep straight.  Seems like the a user created key like 
> this could lead to duplicates - but if it is a key it will prevent 
> duplicates anyway - just prompt for another key.

Oh, yes, this isn't for user keys. I meant where in a programming language 
you'd use an enumerated type.

So if you have a "status" row for jobs that says "in negotiation", "paid for 
but waiting for someone to be assigned", "that person is working on it", and 
"ready to be delivered", you could use this sort of coding to make your 
programs more readable. "NEGO", "PAID", "WORK", "DONE" rather than 0, 1, 2, 3.

> Most of the data in question is a simple map from an integer key to a 
> piece of text.  I do have some queries on the text with joins to pull 
> data from other tables.
> 
> The text is not designed to be changed.
> 
> The alternative that we have is to flatten the relationship and put the 
> text directly in the table instead of having a key reference.
> 
> ie
> go from
> 
> tablePole        tableOwners
> Pole   OwnerID        ID    Name
> 
> 
> to
> 
> tablePole
> Pole    OwnerName
> 
> 
> 
> Any thoughts on this?

If OwnerName shows up in multiple places, then these bits would need to be 
in a PK/FK relationship to properly cascade changes.

I assume each owner has more than one pole. Hence, duplicating the name for 
every pole is inefficient space-wise and dangerous update-wise. (Especially 
if you don't use transactions.)

Is the "owner" a real independent entity? Is it rational to speculate there 
might be an owner with no poles? Or an owner who has poles but hasn't told 
you about them yet? What about an owner who just sold his last pole to pay 
for the pole he's going to buy next week?

So, yes, here, "owner" is an actual entity, not just some field about a 
pole. The name of the owner isn't associated with the pole at all.

Indeed, the *best* way to structure this would be

table Pole: GPS coords, height.  (PK: GPS)

table Owner: ID, contact name, contact phone, etc etc. (PK: ID)

table Ownership: PoleGPS, OwnerID, StartingFromTime. (PK: GPS+StartingFromTime).

table Cables: PoleGPS, cable height, fiber height, power height, 
StartingFromTime. (PK: GPS + StartingFromTime)

Later, you can go back and ask questions like "Did AT&T ever change the 
height of the power lines on the pole at Fifth and Main, or was that 
Cingular, after they bought that line from AT&T?"


*That* might be a bit too normalized for your purposes, but you'll never get 
in a situation where someone sells a bunch of poles, then comes to you and 
asks you to recreate a report from before they sold the poles or something.

See if you can look at the columns you have and come up with a breakdown 
like this into "things really happening out there". You might find yourself 
a natural at it. :-)

Now, the "groups" kind of thing I was talking about, the "secondary" set of 
tables, might be tables that take a group of poles owned by the same 
company, that constitute a single "cable run". Large stretches of cable run 
might have the cable at the same height, so your calculations could be 
faster if you had a table that put each pole in a group, and then had the 
height of the cable on the group specified, or something like that.

It's hard coming up with that sort of thing when I don't know how your 
business works, tho, but that's the approximate sort of thing I was talking 
about.


>>> We are developing a new database system to take care of some severe
>>> shortcomings of the old system.
>>
>> What are the shortcomings?
>>
> 
> Let's just say is was a single table database with about 250 columns - 
> limited only by the fact that Excel at one time only allowed about 250 
> columns.

Oh, so the shortcoming is that it was rushed and grew instead of being 
designed. Makes sense. :-)

> Putting data in and getting data out was very difficult and required the 
> sacrifice of too many brain cells.

That's exactly where SQL queries between multiple tables will help. 
Thinking in SQL instead of in terms of iterating over things is tough, but 
it's almost always possible to come up with a statement in SQL that will 
pull out all and only the data you need. If you ever get to the point where 
you're saying "for each result from *this* query, submit *that* query", 
you're almost 100% sure you're doing something wrong. In my entire career, I 
only ever did that once, and that was after I'd written it the right way and 
figured out MySql was not optimizing things in the obvious way and I didn't 
know how to make it do so.

>>> Our data has a lot of common 'choose from' items - such as the owner
>>> of the pole.
>>> The data consists of pole specific information that doesn't change
>>> (owner, height, etc) and information that will likely change (the
>>> height of cables on the pole).
>>
>> These last two should probably be in separate tables that you join at
>> runtime. (Especially if you're using MySql, where differnet "engines"
>> give you different capabilities.)
>>
> 
> That's what we are trying to figure out - how to make it make sense to 
> have them in separate tables.

Let me know if the description of tables above does or doesn't make sense.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.