POV-Ray: Newsgroups: povray.off-topic: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions		Server Time 3 Sep 2024 19:12:29 EDT (-0400)

Goto Latest 10 Messages

Next 10 Messages >>>

From: Tom Austin
Subject: Database Questions
Date: 8 Mar 2011 09:21:09
Message: <4d763b55$1@news.povray.org>

A change for all of you database experts to shine....

I am working with a group to set up a new database system to handle our 
core work flow data.

We have a working model of the database but are considering some changes 
that affect the core of how it operates and I am seeking input as to 
some best/worst practices.

Any insight into the following issues/questions would be most greatly 
appreciated.


1.  Is the practice of creating separate databases for each 'job' 
considered good or bad?  Is this answer different based on using MS 
Access .mdb files vs using a MySQL Database engine?


2.  How is creating references across databases viewed?
       Example - one database with definitions of data with many other 
databases referencing the definitions.


3.  Normalizing vs Flattening?  At what point is a database 'too' 
normalized and does it matter?  Do you have some thoughts on joins using 
text data types instead of integer numbers?



For further reading, some background...

Our core business is surveying utility/telephone poles to put new fiber 
optic cables on them.  We propose changes to the pole (moving cables) 
and prepare reports/applications for our client to get on the pole.

We collect and store our data in databases.
In the past we have used separate MS Access .mdb databases for each 'job'.
We are developing a new database system to take care of some severe 
shortcomings of the old system.
Our data has a lot of common 'choose from' items - such as the owner of 
the pole.
The data consists of pole specific information that doesn't change 
(owner, height, etc) and information that will likely change (the height 
of cables on the pole).




Thanks for any insight or info that you may provide.

Tom

Post a reply to this message

From: Invisible
Subject: Re: Database Questions
Date: 8 Mar 2011 10:01:52
Message: <4d7644e0$1@news.povray.org>

On 08/03/2011 02:21 PM, Tom Austin wrote:

> 1. Is the practice of creating separate databases for each 'job'
> considered good or bad?

That's going to depend on the database engine in question. (Different 
engines have different ideas about what constitutes a "database".) It's 
also going to depend on your idea of what a "job" is - and also what you 
actually want to use the data *for*.

For example, if each job is a separate database, you're going to have a 
lot of fun trying to compute (for example) the average number of days 
spent working on a job. If all the jobs were in a single database, this 
would be a trivial SQL query, but if each job is a separate database, 
you now have a logistical nightmare figuring out how to open and query 
all the jobs, one at a time.

But then again, if you never need to do this kind of computation, the 
fact that it's really difficult to do becomes irrelevant.

> 2. How is creating references across databases viewed?
> Example - one database with definitions of data with many other
> databases referencing the definitions.

Again, it varies by engine as to how hard it is to set up, how reliable 
it is, and any other gotchas involved.

> 3. Normalizing vs Flattening? At what point is a database 'too'
> normalized and does it matter?

Usually the /only/ reason for denormalising a database is performance. 
So you could say a database is "too normalised" if it's taking too long 
to answer your queries. You might perhaps also argue that a database is 
"too normalised" if it has become too complicated to understand - but 
normally you won't be interacting with tables directly, you'll put 
something pretty over the top to make it "look simple".

Q: Why do we even care about normalising?
A:

- If the data is not normalised, you are duplicating data in some sense.
- If data is duplicated, it is extremely likely that the duplicates will 
eventually get out of sync with each other.
- When this happens, you now have conflicting information, and no idea 
which version (if any!) is actually correct.

That said, denormalisation can work. If you have /automated/ systems to 
keep all the copies in sync, it can be reliable. But now you are making 
your job "more complicated". Usually, a fully normalised set of tables 
means less things to go wrong. (But it also means more tables, 
typically, which may make the thing harder to understand...)

> Do you have some thoughts on joins using text data types instead of
> integer numbers?

If those are *supposed* to be integer numbers, you should store them as 
such. If you don't, you're just asking for trouble some day.

On the other hand, if those are *codes* then they probably should be 
text. E.g., telephone numbers, invoice numbers, product codes, anything 
like that, probably wants to be text.

Ask yourself this: are 7 and 007 the same thing, or different things? If 
you want them to be the same, you want integers. If you want them to be 
different, it's text.

Don't even talk to be about storing dates as anything that isn't a date...

> Our core business is surveying utility/telephone poles to put new fiber
> optic cables on them. We propose changes to the pole (moving cables) and
> prepare reports/applications for our client to get on the pole.
>
> We collect and store our data in databases.
> In the past we have used separate MS Access .mdb databases for each 'job'.
> We are developing a new database system to take care of some severe
> shortcomings of the old system.
> Our data has a lot of common 'choose from' items - such as the owner of
> the pole.
> The data consists of pole specific information that doesn't change
> (owner, height, etc) and information that will likely change (the height
> of cables on the pole).

MS Access is a poor excuse for a real database system, but it *does* 
have the distinct advantage of being /simple/. An Access database is 
just a file. You can manage it as such. A database using a "real" 
database engine such as Oracle or PostgreSQL or whatever usually 
involves configuring server daemon processes and installing multiple 
files and setting up authentication and so on. This is far more secure 
and provides much better performance, but it's also far more complicated.

What our main project system does is this: The "jobs" that we are 
currently working on are held in an Oracle database running on a big 
powerful DB server. When a job is completed, the job is "archived" into 
an Access database file. If we ever want to go back and do something 
with the job, it is "dearchived" back into the Oracle database.

Of course, we have [expensive] software that handles all of this 
automatically, needless to say... You have to balance how much 
functionality you actually need against how hard it's going to be to set 
up and run all this stuff. Just like any other computer project.

Post a reply to this message

From: Le Forgeron
Subject: Re: Database Questions
Date: 8 Mar 2011 11:10:27
Message: <4d7654f3$1@news.povray.org>

Le 08/03/2011 15:21, Tom Austin a écrit :
> A change for all of you database experts to shine....
> 
> I am working with a group to set up a new database system to handle our
> core work flow data.
> 
> We have a working model of the database but are considering some changes
> that affect the core of how it operates and I am seeking input as to
> some best/worst practices.
> 
> Any insight into the following issues/questions would be most greatly
> appreciated.
> 
> 
> 1.  Is the practice of creating separate databases for each 'job'
> considered good or bad?  Is this answer different based on using MS
> Access .mdb files vs using a MySQL Database engine?
> 

If the pieces of data are common or shared, even by the tiniest amount,
a single database is the answer.

Otherwise, how would you handle data's coherency ?

Access won't scale, drop it past the template step.
I won't enter holy wars about other SQL engines, but MySQL is rather
light (well, it get better every year).
You might really get benefit from engine which support views.


> 
> 2.  How is creating references across databases viewed?
>       Example - one database with definitions of data with many other
> databases referencing the definitions.

Go back to step 1;
You probably mean read-only views for a job.


> 
> 
> 3.  Normalizing vs Flattening?  At what point is a database 'too'
> normalized and does it matter?  Do you have some thoughts on joins using
> text data types instead of integer numbers?

Do not optimise before actual tests.

Normalise at least to 3rd form for your data, and keep it so!
The schema of the database should be checked up to 5th form, but that
usually transparent with good database people that are used to the task.

Conceiving the full schema (for all jobs) is needed before writing the
first line, or you will end up rewriting code when a job breaks a
relationship in many...

If you have performance issue, adding some index can help.
If you are normalised to 3++, you should not have blob of data to scan
to achieve your tasks.

Last point: any personal data (such as name of customer) might requires
the clearance of the local law authority (declaring the database, and
getting it allowed, obligation to allow query and modify on basis of
remote people)... For instance, the IP address is such personal piece,
even without name.



-- 
Software is like dirt - it costs time and money to change it and move it
around.

Just because you can't see it, it doesn't weigh anything,
and you can't drill a hole in it and stick a rivet into it doesn't mean
it's free.

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 12:30:43
Message: <4d7667c3$1@news.povray.org>

Tom Austin wrote:
> 1.  Is the practice of creating separate databases for each 'job' 
> considered good or bad?  Is this answer different based on using MS 
> Access .mdb files vs using a MySQL Database engine?

As others have pointed out, if you're ever going to want to do any sort of 
cross-job statistics (like knowing if the same person worked on two 
particular jobs, etc) then you need to have one database.

The only time I've seen multiple databases with identical schemas used that 
was *not* a f'up, it was because it was a software-as-a-service and each 
customer got their own database and essentially managed it themselves. I.e., 
if you can't imagine installing each database at a separate company, you 
probably want one database.

> 2.  How is creating references across databases viewed?
>       Example - one database with definitions of data with many other 
> databases referencing the definitions.

You really can't do that. That's exactly why you put everything in one 
database.

That said, I think some databases let you make cross-database references, 
but ... really, just don't do that.

> 3.  Normalizing vs Flattening?  At what point is a database 'too' 
> normalized and does it matter?  Do you have some thoughts on joins using 
> text data types instead of integer numbers?

Each table that does not necessarily have any references to its rows is 
representing a separate real-world entity. For example, you might have a 
customer, a sale, etc. Subsidiary tables such as individual lines of a sale, 
might be separate in tables, but they won't represent a separate entity. 
(You can usually tell because the row in the subsidiary table will have an 
arbitrary primary key (if the PK isn't just a combination of FKs) *and* a FK 
that's exactly the PK of what it's a subsidiary table to.) That, IME, is 
approximately the right way to think about it and break it down. Build the 
DB as tables of real-world objects and events, then hook them together 
appropriately. You'll almost automatically be "normalized enough". The time 
you get denormalized is when you start thinking about how you're going to be 
processing the data before you think about the actual data, and you start 
doing joins in your head and encoding them into the schema.

Another good way of working it is to see if there's any combination of 
columns that can be a primary key that isn't just some arbitrary number. I 
built an entire database designed to track people's financial accounts (on 
behalf of a bunch of *our* customers) for a stored-value web site that 
didn't have a single auto-increment field in it. (Granted, our customer 
supplied the identifier for *their* customer, so without that I would have 
needed one auto-increment field.) But everything else was keyed off that. 
The credit card number was the primary key for the credit card. The sale's 
primary key was the credit card number and the UTC timestamp of when we made 
the sale. The end-user's PK was our customer's PK concatenated with the 
identifier we had assigned to our customer. In other work, places were keyed 
by the three-letter code of the airport closest to the place (i.e., that's 
how we identified cities), radio stations were identified by their location 
and frequency, etc.  It's not always possible to find such a key, but to the 
extent you can, it tells you a whole bunch about the meaning of the record. 
And if you use that and find it too slow, *then* it's easy to change it to a 
simple integer identifier.

As for text, I've found that a very good trick for readability is that a 
CHAR(4) will index just as fast as an INT.  So if you want the database to 
be both fast and readable with codes, you use codes like 'SALE', 'RFND', 
'VOID', 'CUST', 'MNGR', 'EMPL', etc.  Then, if you want, you can have 
another table mapping these codes into a varchar(250) or something to put in 
drop-down lists.

If you're talking about text or integer for other things, ask if there's any 
possibility the text will ever change. Is it the name of a company or 
person? A phone number or zip code? If so, it needs to be in a table and not 
part of any key.

> We are developing a new database system to take care of some severe 
> shortcomings of the old system.

What are the shortcomings?

> Our data has a lot of common 'choose from' items - such as the owner of 
> the pole.
> The data consists of pole specific information that doesn't change 
> (owner, height, etc) and information that will likely change (the height 
> of cables on the pole).

These last two should probably be in separate tables that you join at 
runtime. (Especially if you're using MySql, where differnet "engines" give 
you different capabilities.)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 14:13:09
Message: <4d767fc5$1@news.povray.org>

On 3/8/2011 10:01 AM, Invisible wrote:
> On 08/03/2011 02:21 PM, Tom Austin wrote:
>
>> 1. Is the practice of creating separate databases for each 'job'
>> considered good or bad?
>
> That's going to depend on the database engine in question. (Different
> engines have different ideas about what constitutes a "database".) It's
> also going to depend on your idea of what a "job" is - and also what you
> actually want to use the data *for*.
>
> For example, if each job is a separate database, you're going to have a
> lot of fun trying to compute (for example) the average number of days
> spent working on a job. If all the jobs were in a single database, this
> would be a trivial SQL query, but if each job is a separate database,
> you now have a logistical nightmare figuring out how to open and query
> all the jobs, one at a time.
>
> But then again, if you never need to do this kind of computation, the
> fact that it's really difficult to do becomes irrelevant.
>

I know that we will want to refer back to the data - either to 'copy' it 
for another job or to just look at historical statistics - like you 
point out.


>> 2. How is creating references across databases viewed?
>> Example - one database with definitions of data with many other
>> databases referencing the definitions.
>
> Again, it varies by engine as to how hard it is to set up, how reliable
> it is, and any other gotchas involved.

I've run across some database engines that support it, but it seemed not 
to be a pointed feature that is used often.

>
>> 3. Normalizing vs Flattening? At what point is a database 'too'
>> normalized and does it matter?
>
> Usually the /only/ reason for denormalising a database is performance.
> So you could say a database is "too normalised" if it's taking too long
> to answer your queries. You might perhaps also argue that a database is
> "too normalised" if it has become too complicated to understand - but
> normally you won't be interacting with tables directly, you'll put
> something pretty over the top to make it "look simple".
>

Our data structure is not too large in comparison to most databases.
Currently we have about 30 tables with maybe 5-10 more required if we 
keep on the current design path.


> Q: Why do we even care about normalising?
> A:
>
> - If the data is not normalised, you are duplicating data in some sense.
> - If data is duplicated, it is extremely likely that the duplicates will
> eventually get out of sync with each other.
> - When this happens, you now have conflicting information, and no idea
> which version (if any!) is actually correct.
>
> That said, denormalisation can work. If you have /automated/ systems to
> keep all the copies in sync, it can be reliable. But now you are making
> your job "more complicated". Usually, a fully normalised set of tables
> means less things to go wrong. (But it also means more tables,
> typically, which may make the thing harder to understand...)
>

With any but the most simple database you will need automated systems to 
access and keep the data straight.  You have to decide where you are 
going to put the work.


>> Do you have some thoughts on joins using text data types instead of
>> integer numbers?
>
> If those are *supposed* to be integer numbers, you should store them as
> such. If you don't, you're just asking for trouble some day.
>
> On the other hand, if those are *codes* then they probably should be
> text. E.g., telephone numbers, invoice numbers, product codes, anything
> like that, probably wants to be text.
>
> Ask yourself this: are 7 and 007 the same thing, or different things? If
> you want them to be the same, you want integers. If you want them to be
> different, it's text.
>
> Don't even talk to be about storing dates as anything that isn't a date...
>

I guess the question is more:
It is good or bad practice to use text as the 'key' rather than 
dereferencing to an integer type?


>> Our core business is surveying utility/telephone poles to put new fiber
>> optic cables on them. We propose changes to the pole (moving cables) and
>> prepare reports/applications for our client to get on the pole.
>>
>> We collect and store our data in databases.
>> In the past we have used separate MS Access .mdb databases for each
>> 'job'.
>> We are developing a new database system to take care of some severe
>> shortcomings of the old system.
>> Our data has a lot of common 'choose from' items - such as the owner of
>> the pole.
>> The data consists of pole specific information that doesn't change
>> (owner, height, etc) and information that will likely change (the height
>> of cables on the pole).
>
> MS Access is a poor excuse for a real database system, but it *does*
> have the distinct advantage of being /simple/. An Access database is
> just a file. You can manage it as such. A database using a "real"
> database engine such as Oracle or PostgreSQL or whatever usually
> involves configuring server daemon processes and installing multiple
> files and setting up authentication and so on. This is far more secure
> and provides much better performance, but it's also far more complicated.
>
> What our main project system does is this: The "jobs" that we are
> currently working on are held in an Oracle database running on a big
> powerful DB server. When a job is completed, the job is "archived" into
> an Access database file. If we ever want to go back and do something
> with the job, it is "dearchived" back into the Oracle database.

Something you said here might work for us.
The poles that we collect do not change and can be reused for other jobs 
- so keeping them around could be good.
The cable heights are more job specific as when the job is built the 
height are now wrong.
Exporting the data for archival purposes could be a good idea.  Even 
exporting a fully functional database with only the particular job data 
required could be good.

>
> Of course, we have [expensive] software that handles all of this
> automatically, needless to say... You have to balance how much
> functionality you actually need against how hard it's going to be to set
> up and run all this stuff. Just like any other computer project.

We are a small show so we cannot afford a ton of overhead to keep a 
system running, yet we also do not want to set up a $20,000 db system 
that is overkill for what we do.



FYI,  my may seem a bit vague and non informal as I am trying to keep my 
personal bias out of what you guys feedback.


Thanks

Post a reply to this message

From: Orchid XP v8
Subject: Re: Database Questions
Date: 8 Mar 2011 15:39:03
Message: <4d7693e7@news.povray.org>

>>> 2. How is creating references across databases viewed?
>>> Example - one database with definitions of data with many other
>>> databases referencing the definitions.
>>
>> Again, it varies by engine as to how hard it is to set up, how reliable
>> it is, and any other gotchas involved.
>
> I've run across some database engines that support it, but it seemed not
> to be a pointed feature that is used often.

If you're a large company with legacy systems that you need to integrate 
with, database references can be very useful. Personally, I would avoid 
them unless you "really need to".

>> That said, denormalisation can work. If you have /automated/ systems to
>> keep all the copies in sync, it can be reliable.
>
> With any but the most simple database you will need automated systems to
> access and keep the data straight. You have to decide where you are
> going to put the work.

Well, for example, apparently people use Access as a front-end. (Even if 
Access isn't the database. You can apparently set up, say, an Ingress DB 
and put an Access front-end on it. Apparently it isn't especially hard.)

I did 6 months' work experience at a small place with a big complicated 
Access database. I don't know how normalised it was, but I know that 
Access certainly can hold data normalised and denormalise it for when 
you want to look at it, renormalising it again behind the scenes.

>>> Do you have some thoughts on joins using text data types instead of
>>> integer numbers?
>>
>> If those are *supposed* to be integer numbers, you should store them as
>> such. If you don't, you're just asking for trouble some day.
>
> I guess the question is more:
> It is good or bad practice to use text as the 'key' rather than
> dereferencing to an integer type?

In terms of efficiency, it makes virtually no difference.

Pro-tip: Don't use real-world things as primary keys. Because you know 
what? Real-world things have this nasty habit of *changing*. You don't 
ever want your primary key to change. This is why databases up and down 
the country refer to Mr Smith not as Mr Smith but as customer #238523.

Use randomly-generated integers with no real-world meaning as PKs and 
you'll have fewer problems if/when the thing the key refers to changes 
in some way.

>> What our main project system does is this: The "jobs" that we are
>> currently working on are held in an Oracle database running on a big
>> powerful DB server. When a job is completed, the job is "archived" into
>> an Access database file. If we ever want to go back and do something
>> with the job, it is "dearchived" back into the Oracle database.
>
> Something you said here might work for us.
> The poles that we collect do not change and can be reused for other jobs
> - so keeping them around could be good.
> The cable heights are more job specific as when the job is built the
> height are now wrong.
> Exporting the data for archival purposes could be a good idea. Even
> exporting a fully functional database with only the particular job data
> required could be good.

I try to be helpful. ;-)

I also presume that you have a far better idea of what your goals 
actually are than I do. So I give you the logic and let you figure out 
the best answer.

> We are a small show so we cannot afford a ton of overhead to keep a
> system running, yet we also do not want to set up a $20,000 db system
> that is overkill for what we do.

Yeah, I kind of figured that by reading between the lines. ;-)

I would suggest not setting up an Oracle database, for example. It 
sounds like it would be utter overkill here.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 15:43:01
Message: <4d7694d5$1@news.povray.org>

Tom Austin wrote:
> I know that we will want to refer back to the data - either to 'copy' it 
> for another job or to just look at historical statistics - like you 
> point out.

Then keep it in one database.

> I've run across some database engines that support it, but it seemed not 
> to be a pointed feature that is used often.

Right. Because most people don't do things that way. There's no good reason 
to make things a separate database unless you want to do administrative 
things differently like having different access or store it on different 
machines.

> Our data structure is not too large in comparison to most databases.
> Currently we have about 30 tables with maybe 5-10 more required if we 
> keep on the current design path.

There's no need to denormalize, then. It also doesn't sound like you're 
doing anything highly transactional, so a quarter-second transaction is 
unlikely to bother anyone using it.

> With any but the most simple database you will need automated systems to 
> access and keep the data straight.  You have to decide where you are 
> going to put the work.

The point of ACID (at least the C part) is that you declare the "keep the 
data straight" part in the database as data. Then a mistake in your 
application cannot corrupt your data in that way.

> I guess the question is more:
> It is good or bad practice to use text as the 'key' rather than 
> dereferencing to an integer type?

The key is an identifier that won't change. What text do you have for the 
entity that won't change? The company name? The road the pole is on? The 
name of the person you should talk to about it? All these things can change.

> Something you said here might work for us.
> The poles that we collect do not change and can be reused for other jobs 
> - so keeping them around could be good.
> The cable heights are more job specific as when the job is built the 
> height are now wrong.

You're doing it wrong. The cable heights don't change. Instead, you have a 
range of dates when the cable was *this* high, and another range of dates 
when the cable was *that* high.

If you overwrite that information, then the customer later comes back and 
asks about it, now you have a problem. Instead, the primary key for the 
cable should be "company id, pole number, first date valid". Then your 
queries can find the height of the cable on any particular day by finding 
the latest date where that company's pole is listed.

> Exporting the data for archival purposes could be a good idea.  Even 
> exporting a fully functional database with only the particular job data 
> required could be good.

All those things are not-too-hard if all the data is in the database.

> We are a small show so we cannot afford a ton of overhead to keep a 
> system running, yet we also do not want to set up a $20,000 db system 
> that is overkill for what we do.

Fortunately, since free RDBMs have come out, the price of a database engine 
has dropped from $100K/user to free. ;-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 15:48:11
Message: <4d76960b$1@news.povray.org>

On 3/8/2011 11:10 AM, Le_Forgeron wrote:
>
> If the pieces of data are common or shared, even by the tiniest amount,
> a single database is the answer.
>
> Otherwise, how would you handle data's coherency ?
>
> Access won't scale, drop it past the template step.
> I won't enter holy wars about other SQL engines, but MySQL is rather
> light (well, it get better every year).
> You might really get benefit from engine which support views.
>

no, not asking for a holy war :-)  tho MySQL is my default if we go the 
single database engine direction.

views present an interesting option - possibly ways of flattening the 
database without requiring that the data itself be flattened

I also like the 'snapshot' of job data export that Andrew mentioned - 
might fix some of our concerns.

>>
>> 3.  Normalizing vs Flattening?  At what point is a database 'too'
>> normalized and does it matter?  Do you have some thoughts on joins using
>> text data types instead of integer numbers?
>
> Do not optimise before actual tests.
>
> Normalise at least to 3rd form for your data, and keep it so!
> The schema of the database should be checked up to 5th form, but that
> usually transparent with good database people that are used to the task.
>
> Conceiving the full schema (for all jobs) is needed before writing the
> first line, or you will end up rewriting code when a job breaks a
> relationship in many...
>
> If you have performance issue, adding some index can help.
> If you are normalised to 3++, you should not have blob of data to scan
> to achieve your tasks.
>

Yes, the data needs to be well understood before starting the schema. 
There should be no anomalies to 'fit in' after the fact.

But the fact is that we have limited time to put this together in and a 
lot of data that can be rather tightly intertwined.  So we have to run 
with 90% and figure out how to cleanly fit things in as we go along. 
Not the best approach, but it does make it 'usable' earlier.

We are at the point of stepping back and trying to put together a better 
schema.

FYI, my questions and information may seem a bit vague and informal as I 
am trying to keep my personal bias out of what you guys feed back.

> Last point: any personal data (such as name of customer) might requires
> the clearance of the local law authority (declaring the database, and
> getting it allowed, obligation to allow query and modify on basis of
> remote people)... For instance, the IP address is such personal piece,
> even without name.
>

Thanks for the tip - tho we don't track people or their data.  We look 
at telephone poles :-)

Thanks :-)

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 16:03:26
Message: <4d76999e$1@news.povray.org>

On 3/8/2011 12:30 PM, Darren New wrote:
> Tom Austin wrote:
>> 1. Is the practice of creating separate databases for each 'job'
>> considered good or bad? Is this answer different based on using MS
>> Access .mdb files vs using a MySQL Database engine?
>
> As others have pointed out, if you're ever going to want to do any sort
> of cross-job statistics (like knowing if the same person worked on two
> particular jobs, etc) then you need to have one database.
>
> The only time I've seen multiple databases with identical schemas used
> that was *not* a f'up, it was because it was a software-as-a-service and
> each customer got their own database and essentially managed it
> themselves. I.e., if you can't imagine installing each database at a
> separate company, you probably want one database.
>
>

Yes, I can think of a few instances of where it could be very useful.  I 
am trying to feel out how usual the practice may be. Thanks for the info.


>
> Each table that does not necessarily have any references to its rows is
> representing a separate real-world entity. For example, you might have a
> customer, a sale, etc. Subsidiary tables such as individual lines of a
> sale, might be separate in tables, but they won't represent a separate
> entity. (You can usually tell because the row in the subsidiary table
> will have an arbitrary primary key (if the PK isn't just a combination
> of FKs) *and* a FK that's exactly the PK of what it's a subsidiary table
> to.) That, IME, is approximately the right way to think about it and
> break it down. Build the DB as tables of real-world objects and events,
> then hook them together appropriately. You'll almost automatically be
> "normalized enough". The time you get denormalized is when you start
> thinking about how you're going to be processing the data before you
> think about the actual data, and you start doing joins in your head and
> encoding them into the schema.
>

Interesting way of looking at building a schema - just break it down to 
objects.  I think this fits our data very well.  I think we were doing 
this anyway, but not on purpose - more by accident.

> Another good way of working it is to see if there's any combination of
> columns that can be a primary key that isn't just some arbitrary number.
> I built an entire database designed to track people's financial accounts
> (on behalf of a bunch of *our* customers) for a stored-value web site
> that didn't have a single auto-increment field in it. (Granted, our
> customer supplied the identifier for *their* customer, so without that I
> would have needed one auto-increment field.) But everything else was
> keyed off that. The credit card number was the primary key for the
> credit card. The sale's primary key was the credit card number and the
> UTC timestamp of when we made the sale. The end-user's PK was our
> customer's PK concatenated with the identifier we had assigned to our
> customer. In other work, places were keyed by the three-letter code of
> the airport closest to the place (i.e., that's how we identified
> cities), radio stations were identified by their location and frequency,
> etc. It's not always possible to find such a key, but to the extent you
> can, it tells you a whole bunch about the meaning of the record. And if
> you use that and find it too slow, *then* it's easy to change it to a
> simple integer identifier.
>

Sounds like you had fun.  I like how you had to get creative on how to 
set primary keys - and using a pre-established system to boot (the 
airport codes) - genius.


> As for text, I've found that a very good trick for readability is that a
> CHAR(4) will index just as fast as an INT. So if you want the database
> to be both fast and readable with codes, you use codes like 'SALE',
> 'RFND', 'VOID', 'CUST', 'MNGR', 'EMPL', etc. Then, if you want, you can
> have another table mapping these codes into a varchar(250) or something
> to put in drop-down lists.
>

this might be worth considering, but I don't know how easy it will be to 
implement and keep straight.  Seems like the a user created key like 
this could lead to duplicates - but if it is a key it will prevent 
duplicates anyway - just prompt for another key.

> If you're talking about text or integer for other things, ask if there's
> any possibility the text will ever change. Is it the name of a company
> or person? A phone number or zip code? If so, it needs to be in a table
> and not part of any key.
>

Most of the data in question is a simple map from an integer key to a 
piece of text.  I do have some queries on the text with joins to pull 
data from other tables.

The text is not designed to be changed.

The alternative that we have is to flatten the relationship and put the 
text directly in the table instead of having a key reference.

ie
go from

tablePole		tableOwners
Pole   OwnerID		ID	Name


to

tablePole
Pole	OwnerName



Any thoughts on this?


>> We are developing a new database system to take care of some severe
>> shortcomings of the old system.
>
> What are the shortcomings?
>

Let's just say is was a single table database with about 250 columns - 
limited only by the fact that Excel at one time only allowed about 250 
columns.

Putting data in and getting data out was very difficult and required the 
sacrifice of too many brain cells.


>> Our data has a lot of common 'choose from' items - such as the owner
>> of the pole.
>> The data consists of pole specific information that doesn't change
>> (owner, height, etc) and information that will likely change (the
>> height of cables on the pole).
>
> These last two should probably be in separate tables that you join at
> runtime. (Especially if you're using MySql, where differnet "engines"
> give you different capabilities.)
>

That's what we are trying to figure out - how to make it make sense to 
have them in separate tables.



FYI, my statements and questions may seem vague or uninformative.  I am 
trying to get input without having my bias guide it.


Thanks

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 16:10:03
Message: <4d769b2b$1@news.povray.org>

On 3/8/2011 3:39 PM, Orchid XP v8 wrote:
>>>> 2. How is creating references across databases viewed?
>>>> Example - one database with definitions of data with many other
>>>> databases referencing the definitions.
>>>
>>> Again, it varies by engine as to how hard it is to set up, how reliable
>>> it is, and any other gotchas involved.
>>
>> I've run across some database engines that support it, but it seemed not
>> to be a pointed feature that is used often.
>
> If you're a large company with legacy systems that you need to integrate
> with, database references can be very useful. Personally, I would avoid
> them unless you "really need to".
>

thanks for the input

>>> That said, denormalisation can work. If you have /automated/ systems to
>>> keep all the copies in sync, it can be reliable.
>>
>> With any but the most simple database you will need automated systems to
>> access and keep the data straight. You have to decide where you are
>> going to put the work.
>
> Well, for example, apparently people use Access as a front-end. (Even if
> Access isn't the database. You can apparently set up, say, an Ingress DB
> and put an Access front-end on it. Apparently it isn't especially hard.)
>

Yes - we used to use access for the front and back ends - until we 
needed a bit more flexibility on how to present/use the data.

I've also seen access used with other db engines - can be useful, but 
painful at the same time.


> I did 6 months' work experience at a small place with a big complicated
> Access database. I don't know how normalised it was, but I know that
> Access certainly can hold data normalised and denormalise it for when
> you want to look at it, renormalising it again behind the scenes.
>

We only use access .mdb files - not access itself.  We interface the 
databases with custom programming.

>>>> Do you have some thoughts on joins using text data types instead of
>>>> integer numbers?
>>>
>>> If those are *supposed* to be integer numbers, you should store them as
>>> such. If you don't, you're just asking for trouble some day.
>>
>> I guess the question is more:
>> It is good or bad practice to use text as the 'key' rather than
>> dereferencing to an integer type?
>
> In terms of efficiency, it makes virtually no difference.
>
> Pro-tip: Don't use real-world things as primary keys. Because you know
> what? Real-world things have this nasty habit of *changing*. You don't
> ever want your primary key to change. This is why databases up and down
> the country refer to Mr Smith not as Mr Smith but as customer #238523.
>
> Use randomly-generated integers with no real-world meaning as PKs and
> you'll have fewer problems if/when the thing the key refers to changes
> in some way.
>

makes sense - any text may need to eventually change which can break a 
lot of things if reference integrity is not completely automatic with 
the text.

>>
>> Something you said here might work for us.
>> The poles that we collect do not change and can be reused for other jobs
>> - so keeping them around could be good.
>> The cable heights are more job specific as when the job is built the
>> height are now wrong.
>> Exporting the data for archival purposes could be a good idea. Even
>> exporting a fully functional database with only the particular job data
>> required could be good.
>
> I try to be helpful. ;-)
>
> I also presume that you have a far better idea of what your goals
> actually are than I do. So I give you the logic and let you figure out
> the best answer.
>

that's a good way to get people to follow you :-)


>> We are a small show so we cannot afford a ton of overhead to keep a
>> system running, yet we also do not want to set up a $20,000 db system
>> that is overkill for what we do.
>
> Yeah, I kind of figured that by reading between the lines. ;-)

OK smarty - just how many people do you think we have here :-?

>
> I would suggest not setting up an Oracle database, for example. It
> sounds like it would be utter overkill here.
>

yes, even MySQL might be overkill  :-)

Post a reply to this message

Goto Latest 10 Messages

Next 10 Messages >>>