|
 |
Tom Austin wrote:
> Interesting way of looking at building a schema - just break it down to
> objects. I think this fits our data very well. I think we were doing
> this anyway, but not on purpose - more by accident.
Trust me on this: It's the only way you ever get a schema you don't have to
change with every software update. Record what is "out there", and to the
extent sensible, never overwrite data. If you find yourself thinking "We'll
need to keep copies", you've done something wrong, because what that really
means is you need the state of something at two different times, but you
didn't record the times. Record *everything*, in as raw a format as you can.
I am looking at a financial system right now where I never overwrite data.
When someone swipes a card, I put into a table exactly what the card machine
gave me: The card machine ID, the credit card number, the amount the clerk
typed in, and the timestamp I got the record. I don't record the merchant
ID in that table, or the ID of the cardholder. That all comes later, when I
process the records.
Why? Because the merchant might change merchant accounts before telling us.
The cardholder might lose the card. Heck, we might even accidentally create
two cards with the same number.
I'm blathering. Nevermind.
Anyway, yes, record everything from *outside* your company in one set of
tables, as *raw* as you can. Changes to these tables should be idempotent.
(I.e., you *should* be able to read the CSV file the telco gave you listing
the pole information multiple times into the same table without corruption.
This implies no automatically-generated primary key on those tables. That
said, again, this is just a guideline and not always possible.)
Have a second set of tables, completely created from the first set, that
holds any calculated information that might be difficult to recalculate
(e.g., running account balances in a financial app), but which can be
checked against and recreated from the raw data.
Once you have stuff working, figure out where it's slow, *then* fix that.
Usually "fix that" involves changing the declarations of some indexes, not
rearranging the actual data (which, if you have data for which you can
follow the rules above, you can't really rearrange anyway).
> this might be worth considering, but I don't know how easy it will be to
> implement and keep straight. Seems like the a user created key like
> this could lead to duplicates - but if it is a key it will prevent
> duplicates anyway - just prompt for another key.
Oh, yes, this isn't for user keys. I meant where in a programming language
you'd use an enumerated type.
So if you have a "status" row for jobs that says "in negotiation", "paid for
but waiting for someone to be assigned", "that person is working on it", and
"ready to be delivered", you could use this sort of coding to make your
programs more readable. "NEGO", "PAID", "WORK", "DONE" rather than 0, 1, 2, 3.
> Most of the data in question is a simple map from an integer key to a
> piece of text. I do have some queries on the text with joins to pull
> data from other tables.
>
> The text is not designed to be changed.
>
> The alternative that we have is to flatten the relationship and put the
> text directly in the table instead of having a key reference.
>
> ie
> go from
>
> tablePole tableOwners
> Pole OwnerID ID Name
>
>
> to
>
> tablePole
> Pole OwnerName
>
>
>
> Any thoughts on this?
If OwnerName shows up in multiple places, then these bits would need to be
in a PK/FK relationship to properly cascade changes.
I assume each owner has more than one pole. Hence, duplicating the name for
every pole is inefficient space-wise and dangerous update-wise. (Especially
if you don't use transactions.)
Is the "owner" a real independent entity? Is it rational to speculate there
might be an owner with no poles? Or an owner who has poles but hasn't told
you about them yet? What about an owner who just sold his last pole to pay
for the pole he's going to buy next week?
So, yes, here, "owner" is an actual entity, not just some field about a
pole. The name of the owner isn't associated with the pole at all.
Indeed, the *best* way to structure this would be
table Pole: GPS coords, height. (PK: GPS)
table Owner: ID, contact name, contact phone, etc etc. (PK: ID)
table Ownership: PoleGPS, OwnerID, StartingFromTime. (PK: GPS+StartingFromTime).
table Cables: PoleGPS, cable height, fiber height, power height,
StartingFromTime. (PK: GPS + StartingFromTime)
Later, you can go back and ask questions like "Did AT&T ever change the
height of the power lines on the pole at Fifth and Main, or was that
Cingular, after they bought that line from AT&T?"
*That* might be a bit too normalized for your purposes, but you'll never get
in a situation where someone sells a bunch of poles, then comes to you and
asks you to recreate a report from before they sold the poles or something.
See if you can look at the columns you have and come up with a breakdown
like this into "things really happening out there". You might find yourself
a natural at it. :-)
Now, the "groups" kind of thing I was talking about, the "secondary" set of
tables, might be tables that take a group of poles owned by the same
company, that constitute a single "cable run". Large stretches of cable run
might have the cable at the same height, so your calculations could be
faster if you had a table that put each pole in a group, and then had the
height of the cable on the group specified, or something like that.
It's hard coming up with that sort of thing when I don't know how your
business works, tho, but that's the approximate sort of thing I was talking
about.
>>> We are developing a new database system to take care of some severe
>>> shortcomings of the old system.
>>
>> What are the shortcomings?
>>
>
> Let's just say is was a single table database with about 250 columns -
> limited only by the fact that Excel at one time only allowed about 250
> columns.
Oh, so the shortcoming is that it was rushed and grew instead of being
designed. Makes sense. :-)
> Putting data in and getting data out was very difficult and required the
> sacrifice of too many brain cells.
That's exactly where SQL queries between multiple tables will help.
Thinking in SQL instead of in terms of iterating over things is tough, but
it's almost always possible to come up with a statement in SQL that will
pull out all and only the data you need. If you ever get to the point where
you're saying "for each result from *this* query, submit *that* query",
you're almost 100% sure you're doing something wrong. In my entire career, I
only ever did that once, and that was after I'd written it the right way and
figured out MySql was not optimizing things in the obvious way and I didn't
know how to make it do so.
>>> Our data has a lot of common 'choose from' items - such as the owner
>>> of the pole.
>>> The data consists of pole specific information that doesn't change
>>> (owner, height, etc) and information that will likely change (the
>>> height of cables on the pole).
>>
>> These last two should probably be in separate tables that you join at
>> runtime. (Especially if you're using MySql, where differnet "engines"
>> give you different capabilities.)
>>
>
> That's what we are trying to figure out - how to make it make sense to
> have them in separate tables.
Let me know if the description of tables above does or doesn't make sense.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |