|
|
Tom Austin wrote:
> 1. Is the practice of creating separate databases for each 'job'
> considered good or bad? Is this answer different based on using MS
> Access .mdb files vs using a MySQL Database engine?
As others have pointed out, if you're ever going to want to do any sort of
cross-job statistics (like knowing if the same person worked on two
particular jobs, etc) then you need to have one database.
The only time I've seen multiple databases with identical schemas used that
was *not* a f'up, it was because it was a software-as-a-service and each
customer got their own database and essentially managed it themselves. I.e.,
if you can't imagine installing each database at a separate company, you
probably want one database.
> 2. How is creating references across databases viewed?
> Example - one database with definitions of data with many other
> databases referencing the definitions.
You really can't do that. That's exactly why you put everything in one
database.
That said, I think some databases let you make cross-database references,
but ... really, just don't do that.
> 3. Normalizing vs Flattening? At what point is a database 'too'
> normalized and does it matter? Do you have some thoughts on joins using
> text data types instead of integer numbers?
Each table that does not necessarily have any references to its rows is
representing a separate real-world entity. For example, you might have a
customer, a sale, etc. Subsidiary tables such as individual lines of a sale,
might be separate in tables, but they won't represent a separate entity.
(You can usually tell because the row in the subsidiary table will have an
arbitrary primary key (if the PK isn't just a combination of FKs) *and* a FK
that's exactly the PK of what it's a subsidiary table to.) That, IME, is
approximately the right way to think about it and break it down. Build the
DB as tables of real-world objects and events, then hook them together
appropriately. You'll almost automatically be "normalized enough". The time
you get denormalized is when you start thinking about how you're going to be
processing the data before you think about the actual data, and you start
doing joins in your head and encoding them into the schema.
Another good way of working it is to see if there's any combination of
columns that can be a primary key that isn't just some arbitrary number. I
built an entire database designed to track people's financial accounts (on
behalf of a bunch of *our* customers) for a stored-value web site that
didn't have a single auto-increment field in it. (Granted, our customer
supplied the identifier for *their* customer, so without that I would have
needed one auto-increment field.) But everything else was keyed off that.
The credit card number was the primary key for the credit card. The sale's
primary key was the credit card number and the UTC timestamp of when we made
the sale. The end-user's PK was our customer's PK concatenated with the
identifier we had assigned to our customer. In other work, places were keyed
by the three-letter code of the airport closest to the place (i.e., that's
how we identified cities), radio stations were identified by their location
and frequency, etc. It's not always possible to find such a key, but to the
extent you can, it tells you a whole bunch about the meaning of the record.
And if you use that and find it too slow, *then* it's easy to change it to a
simple integer identifier.
As for text, I've found that a very good trick for readability is that a
CHAR(4) will index just as fast as an INT. So if you want the database to
be both fast and readable with codes, you use codes like 'SALE', 'RFND',
'VOID', 'CUST', 'MNGR', 'EMPL', etc. Then, if you want, you can have
another table mapping these codes into a varchar(250) or something to put in
drop-down lists.
If you're talking about text or integer for other things, ask if there's any
possibility the text will ever change. Is it the name of a company or
person? A phone number or zip code? If so, it needs to be in a table and not
part of any key.
> We are developing a new database system to take care of some severe
> shortcomings of the old system.
What are the shortcomings?
> Our data has a lot of common 'choose from' items - such as the owner of
> the pole.
> The data consists of pole specific information that doesn't change
> (owner, height, etc) and information that will likely change (the height
> of cables on the pole).
These last two should probably be in separate tables that you join at
runtime. (Especially if you're using MySql, where differnet "engines" give
you different capabilities.)
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
|