POV-Ray : Newsgroups : povray.off-topic : Database Questions Server Time
4 Sep 2024 01:22:15 EDT (-0400)
  Database Questions (Message 9 to 18 of 68)  
<<< Previous 8 Messages Goto Latest 10 Messages Next 10 Messages >>>
From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 16:03:26
Message: <4d76999e$1@news.povray.org>
On 3/8/2011 12:30 PM, Darren New wrote:
> Tom Austin wrote:
>> 1. Is the practice of creating separate databases for each 'job'
>> considered good or bad? Is this answer different based on using MS
>> Access .mdb files vs using a MySQL Database engine?
>
> As others have pointed out, if you're ever going to want to do any sort
> of cross-job statistics (like knowing if the same person worked on two
> particular jobs, etc) then you need to have one database.
>
> The only time I've seen multiple databases with identical schemas used
> that was *not* a f'up, it was because it was a software-as-a-service and
> each customer got their own database and essentially managed it
> themselves. I.e., if you can't imagine installing each database at a
> separate company, you probably want one database.
>
>

Yes, I can think of a few instances of where it could be very useful.  I 
am trying to feel out how usual the practice may be. Thanks for the info.


>
> Each table that does not necessarily have any references to its rows is
> representing a separate real-world entity. For example, you might have a
> customer, a sale, etc. Subsidiary tables such as individual lines of a
> sale, might be separate in tables, but they won't represent a separate
> entity. (You can usually tell because the row in the subsidiary table
> will have an arbitrary primary key (if the PK isn't just a combination
> of FKs) *and* a FK that's exactly the PK of what it's a subsidiary table
> to.) That, IME, is approximately the right way to think about it and
> break it down. Build the DB as tables of real-world objects and events,
> then hook them together appropriately. You'll almost automatically be
> "normalized enough". The time you get denormalized is when you start
> thinking about how you're going to be processing the data before you
> think about the actual data, and you start doing joins in your head and
> encoding them into the schema.
>

Interesting way of looking at building a schema - just break it down to 
objects.  I think this fits our data very well.  I think we were doing 
this anyway, but not on purpose - more by accident.

> Another good way of working it is to see if there's any combination of
> columns that can be a primary key that isn't just some arbitrary number.
> I built an entire database designed to track people's financial accounts
> (on behalf of a bunch of *our* customers) for a stored-value web site
> that didn't have a single auto-increment field in it. (Granted, our
> customer supplied the identifier for *their* customer, so without that I
> would have needed one auto-increment field.) But everything else was
> keyed off that. The credit card number was the primary key for the
> credit card. The sale's primary key was the credit card number and the
> UTC timestamp of when we made the sale. The end-user's PK was our
> customer's PK concatenated with the identifier we had assigned to our
> customer. In other work, places were keyed by the three-letter code of
> the airport closest to the place (i.e., that's how we identified
> cities), radio stations were identified by their location and frequency,
> etc. It's not always possible to find such a key, but to the extent you
> can, it tells you a whole bunch about the meaning of the record. And if
> you use that and find it too slow, *then* it's easy to change it to a
> simple integer identifier.
>

Sounds like you had fun.  I like how you had to get creative on how to 
set primary keys - and using a pre-established system to boot (the 
airport codes) - genius.


> As for text, I've found that a very good trick for readability is that a
> CHAR(4) will index just as fast as an INT. So if you want the database
> to be both fast and readable with codes, you use codes like 'SALE',
> 'RFND', 'VOID', 'CUST', 'MNGR', 'EMPL', etc. Then, if you want, you can
> have another table mapping these codes into a varchar(250) or something
> to put in drop-down lists.
>

this might be worth considering, but I don't know how easy it will be to 
implement and keep straight.  Seems like the a user created key like 
this could lead to duplicates - but if it is a key it will prevent 
duplicates anyway - just prompt for another key.

> If you're talking about text or integer for other things, ask if there's
> any possibility the text will ever change. Is it the name of a company
> or person? A phone number or zip code? If so, it needs to be in a table
> and not part of any key.
>

Most of the data in question is a simple map from an integer key to a 
piece of text.  I do have some queries on the text with joins to pull 
data from other tables.

The text is not designed to be changed.

The alternative that we have is to flatten the relationship and put the 
text directly in the table instead of having a key reference.

ie
go from

tablePole		tableOwners
Pole   OwnerID		ID	Name


to

tablePole
Pole	OwnerName



Any thoughts on this?


>> We are developing a new database system to take care of some severe
>> shortcomings of the old system.
>
> What are the shortcomings?
>

Let's just say is was a single table database with about 250 columns - 
limited only by the fact that Excel at one time only allowed about 250 
columns.

Putting data in and getting data out was very difficult and required the 
sacrifice of too many brain cells.


>> Our data has a lot of common 'choose from' items - such as the owner
>> of the pole.
>> The data consists of pole specific information that doesn't change
>> (owner, height, etc) and information that will likely change (the
>> height of cables on the pole).
>
> These last two should probably be in separate tables that you join at
> runtime. (Especially if you're using MySql, where differnet "engines"
> give you different capabilities.)
>

That's what we are trying to figure out - how to make it make sense to 
have them in separate tables.



FYI, my statements and questions may seem vague or uninformative.  I am 
trying to get input without having my bias guide it.


Thanks


Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 16:10:03
Message: <4d769b2b$1@news.povray.org>
On 3/8/2011 3:39 PM, Orchid XP v8 wrote:
>>>> 2. How is creating references across databases viewed?
>>>> Example - one database with definitions of data with many other
>>>> databases referencing the definitions.
>>>
>>> Again, it varies by engine as to how hard it is to set up, how reliable
>>> it is, and any other gotchas involved.
>>
>> I've run across some database engines that support it, but it seemed not
>> to be a pointed feature that is used often.
>
> If you're a large company with legacy systems that you need to integrate
> with, database references can be very useful. Personally, I would avoid
> them unless you "really need to".
>

thanks for the input

>>> That said, denormalisation can work. If you have /automated/ systems to
>>> keep all the copies in sync, it can be reliable.
>>
>> With any but the most simple database you will need automated systems to
>> access and keep the data straight. You have to decide where you are
>> going to put the work.
>
> Well, for example, apparently people use Access as a front-end. (Even if
> Access isn't the database. You can apparently set up, say, an Ingress DB
> and put an Access front-end on it. Apparently it isn't especially hard.)
>

Yes - we used to use access for the front and back ends - until we 
needed a bit more flexibility on how to present/use the data.

I've also seen access used with other db engines - can be useful, but 
painful at the same time.


> I did 6 months' work experience at a small place with a big complicated
> Access database. I don't know how normalised it was, but I know that
> Access certainly can hold data normalised and denormalise it for when
> you want to look at it, renormalising it again behind the scenes.
>

We only use access .mdb files - not access itself.  We interface the 
databases with custom programming.

>>>> Do you have some thoughts on joins using text data types instead of
>>>> integer numbers?
>>>
>>> If those are *supposed* to be integer numbers, you should store them as
>>> such. If you don't, you're just asking for trouble some day.
>>
>> I guess the question is more:
>> It is good or bad practice to use text as the 'key' rather than
>> dereferencing to an integer type?
>
> In terms of efficiency, it makes virtually no difference.
>
> Pro-tip: Don't use real-world things as primary keys. Because you know
> what? Real-world things have this nasty habit of *changing*. You don't
> ever want your primary key to change. This is why databases up and down
> the country refer to Mr Smith not as Mr Smith but as customer #238523.
>
> Use randomly-generated integers with no real-world meaning as PKs and
> you'll have fewer problems if/when the thing the key refers to changes
> in some way.
>

makes sense - any text may need to eventually change which can break a 
lot of things if reference integrity is not completely automatic with 
the text.

>>
>> Something you said here might work for us.
>> The poles that we collect do not change and can be reused for other jobs
>> - so keeping them around could be good.
>> The cable heights are more job specific as when the job is built the
>> height are now wrong.
>> Exporting the data for archival purposes could be a good idea. Even
>> exporting a fully functional database with only the particular job data
>> required could be good.
>
> I try to be helpful. ;-)
>
> I also presume that you have a far better idea of what your goals
> actually are than I do. So I give you the logic and let you figure out
> the best answer.
>

that's a good way to get people to follow you :-)


>> We are a small show so we cannot afford a ton of overhead to keep a
>> system running, yet we also do not want to set up a $20,000 db system
>> that is overkill for what we do.
>
> Yeah, I kind of figured that by reading between the lines. ;-)

OK smarty - just how many people do you think we have here :-?

>
> I would suggest not setting up an Oracle database, for example. It
> sounds like it would be utter overkill here.
>

yes, even MySQL might be overkill  :-)


Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 16:21:02
Message: <4d769dbe$1@news.povray.org>
On 3/8/2011 3:42 PM, Darren New wrote:
> Tom Austin wrote:
>> I know that we will want to refer back to the data - either to 'copy'
>> it for another job or to just look at historical statistics - like you
>> point out.
>
> Then keep it in one database.
>
>> I've run across some database engines that support it, but it seemed
>> not to be a pointed feature that is used often.
>
> Right. Because most people don't do things that way. There's no good
> reason to make things a separate database unless you want to do
> administrative things differently like having different access or store
> it on different machines.
>

thanks, point made :-)


>> Our data structure is not too large in comparison to most databases.
>> Currently we have about 30 tables with maybe 5-10 more required if we
>> keep on the current design path.
>
> There's no need to denormalize, then. It also doesn't sound like you're
> doing anything highly transactional, so a quarter-second transaction is
> unlikely to bother anyone using it.
>

Having unoptimized programs makes it very slow.  Programs written as 
fast as possible do not benefit from being efficient.

But I think with optimization of how to access the data it can be made 
very quick.  Also, moving away from a Accedd .mdb file will speed things 
up considerably.


>> With any but the most simple database you will need automated systems
>> to access and keep the data straight. You have to decide where you are
>> going to put the work.
>
> The point of ACID (at least the C part) is that you declare the "keep
> the data straight" part in the database as data. Then a mistake in your
> application cannot corrupt your data in that way.
>

Thanks for the acronym - more info to go on :-)

I don't know if we will try or need to set up transactions, but they 
could be helpful if something is not a simple insert or update.

>> I guess the question is more:
>> It is good or bad practice to use text as the 'key' rather than
>> dereferencing to an integer type?
>
> The key is an identifier that won't change. What text do you have for
> the entity that won't change? The company name? The road the pole is on?
> The name of the person you should talk to about it? All these things can
> change.

point made - thanks :-)


>> Something you said here might work for us.
>> The poles that we collect do not change and can be reused for other
>> jobs - so keeping them around could be good.
>> The cable heights are more job specific as when the job is built the
>> height are now wrong.
>
> You're doing it wrong. The cable heights don't change. Instead, you have
> a range of dates when the cable was *this* high, and another range of
> dates when the cable was *that* high.
>

OK, you want a job surveying utility poles :-)  You have good insight.


> If you overwrite that information, then the customer later comes back
> and asks about it, now you have a problem. Instead, the primary key for
> the cable should be "company id, pole number, first date valid". Then
> your queries can find the height of the cable on any particular day by
> finding the latest date where that company's pole is listed.
>

At this point I am not used to setting up primary keys with multiple 
columns - but this is making some sense.  The question we have is if we 
have time to implement some such beast :-)


>> Exporting the data for archival purposes could be a good idea. Even
>> exporting a fully functional database with only the particular job
>> data required could be good.
>
> All those things are not-too-hard if all the data is in the database.
>

Yes, and if the database schema is identical, the same programs can use 
any database - from mySQL to .mdb files.


>> We are a small show so we cannot afford a ton of overhead to keep a
>> system running, yet we also do not want to set up a $20,000 db system
>> that is overkill for what we do.
>
> Fortunately, since free RDBMs have come out, the price of a database
> engine has dropped from $100K/user to free. ;-)
>

Well, almost free - just takes time to set up.  By the time all is said 
and done I figure that mySQL will take about $1k - $2k in my time to set 
up reliability.  Need to get it installed on our server and set up 
connection methods and backup methods.  Not too difficult, but I don't 
do it every day.


Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 16:24:17
Message: <4d769e81$1@news.povray.org>
Orchid XP v8 wrote:
> Pro-tip: Don't use real-world things as primary keys. Because you know 
> what? Real-world things have this nasty habit of *changing*. You don't 
> ever want your primary key to change. This is why databases up and down 
> the country refer to Mr Smith not as Mr Smith but as customer #238523.

I disagree. The only time you should use automatically-generated primary 
keys is when there is no attribute of the item in the table that is 
guaranteed not to change.

A sales receipt whose primary key is the cash register identifier and the 
time to the second when the receipt was printed is an excellent PK for a 
sales receipt. A record recording what song was playing on the radio has the 
obvious PK of location+timestamp+frequency. A road intersection has a fine 
PK of the GPS coordinates of the intersection.

The problem is that people are changing all the time, so it's very hard to 
have a customer record that persists with a PK that makes sense.

> Use randomly-generated integers with no real-world meaning as PKs and 
> you'll have fewer problems if/when the thing the key refers to changes 
> in some way.

The drawback of this is if you get a reference to the wrong value, you'll 
never notice.

> I would suggest not setting up an Oracle database, for example. It 
> sounds like it would be utter overkill here.

Yes. One of the free databases (Postgresql or mysql) or (on windows) SQL 
Server (Express, perhaps) would do fine.  You'd probably find the most free 
help with MySql, but it's kind of messy to run on Windows, or at least was 
last I checked. SQL Server Express would *probably* have enough performance 
for you, given you were using Access.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 16:31:22
Message: <4d76a02a$1@news.povray.org>
Tom Austin wrote:
> Yes, the data needs to be well understood before starting the schema. 
> There should be no anomalies to 'fit in' after the fact.

The absolute best way to make this work out, then, is to record "reality" in 
the database. Ignore how you're going to use the data while you're designing 
what's in the database (to the extent possible). Instead, simply record facts.

Of course you need to know the height of the poles and their locations and 
such, because that's what you do your calculations with.

What I'm trying to say is don't say "our calculations would be easier if our 
data was arranged like *this*."  If you stick with the reality of what's out 
there, you will never run into the situation where you have to *change* data 
you already have in the database.

I know this sounds kind of vague, and it can be hard to get away from it if 
you already have a model of something up and running and you know how that 
works.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 16:43:44
Message: <4d76a310$1@news.povray.org>
Tom Austin wrote:
> Having unoptimized programs makes it very slow.  Programs written as 
> fast as possible do not benefit from being efficient.

Modern database systems are designed to take normalized tables and make 
accessing them very fast. That's the fundamental operation a relational 
database management system performs.

How big is your total collection of active databases right now? If it fits 
in (say) a gigabyte, chances are you can't possibly make it too slow.

If it *is* too slow when done "right", *then* you get out the query analyzer 
and look to see what it's spending its time doing, and you add the one index 
that cuts that time from minutes to miliseconds.

> Thanks for the acronym - more info to go on :-)
> 
> I don't know if we will try or need to set up transactions, but they 
> could be helpful if something is not a simple insert or update.

You will need transactions. Trust me on this one. :-)  If nothing else, 
you'll need transactions when you're archiving out an old job.

That said, it's surprisingly common that if you structure your queries 
right, you can make every operation essentially an "append to this table", 
followed by an eventual "update that table in a way that if it fails, we can 
reconstruct that table entirely from scratch".   Think Journal -> General 
Ledger.

>> The key is an identifier that won't change. What text do you have for
>> the entity that won't change? The company name? The road the pole is on?
>> The name of the person you should talk to about it? All these things can
>> change.
> 
> point made - thanks :-)

That siad, btw, it's often the case that you *can* find text that won't 
change often enough to make a difference. Stock ticker, street name, radio 
station call letters. If it changes rarely enough that you're willing to 
treat it as an "delete all the old, insert all the new" sort of transaction, 
using text is OK. One generally doesn't do this with people, but saying "if 
you change your stock ticker, you have to treat it as selling all the old 
shares and getting the same number of new shares from the new ticker name" 
isn't too unreasonable.

> OK, you want a job surveying utility poles :-)  You have good insight.

I've just done this dozens of times in the last couple decades.

> At this point I am not used to setting up primary keys with multiple 
> columns - but this is making some sense.  The question we have is if we 
> have time to implement some such beast :-)

Look at it this way: This is the data that is vital to your company, in the 
strictest sense. Your company's job is to manage and evolve this data. You 
don't have time to get it wrong.

What about your current system is so broken that you can't afford the time 
to fix it?

The other possibility is to hire someone who actually does this sort of 
stuff for a living for long enough to show you how to set up the basics of 
the database.

>>> Exporting the data for archival purposes could be a good idea. Even
>>> exporting a fully functional database with only the particular job
>>> data required could be good.
>>
>> All those things are not-too-hard if all the data is in the database.
>>
> 
> Yes, and if the database schema is identical, the same programs can use 
> any database - from mySQL to .mdb files.

That's the goal. As you get more sophisticated, you find that breaks down. 
When you use views in your "real" database and you archive it to something 
that has no views, you find yourself with a problem.

That said, even a cheap machine nowadays has more than enough processing 
power to do whole boatloads of database processing.

About how much data do you have, in total and in active jobs?  Many people 
seriously underestimate just how much processing even a $1000 machine can do 
on a SQL database.

> Well, almost free - just takes time to set up.  By the time all is said 
> and done I figure that mySQL will take about $1k - $2k in my time to set 
> up reliability.  Need to get it installed on our server and set up 
> connection methods and backup methods.  Not too difficult, but I don't 
> do it every day.

Be sure to factor in some ongoing maintenance time. Occasionally your 
replication will break, or something like that, and you'll have to fix stuff 
in a panic. :-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

From: Orchid XP v8
Subject: Re: Database Questions
Date: 8 Mar 2011 17:03:07
Message: <4d76a79b$1@news.povray.org>
On 08/03/2011 09:24 PM, Darren New wrote:
> Orchid XP v8 wrote:
>> Pro-tip: Don't use real-world things as primary keys. Because you know
>> what? Real-world things have this nasty habit of *changing*. You don't
>> ever want your primary key to change. This is why databases up and
>> down the country refer to Mr Smith not as Mr Smith but as customer
>> #238523.
>
> I disagree. The only time you should use automatically-generated primary
> keys is when there is no attribute of the item in the table that is
> guaranteed not to change.

The recommendation comes not so much from me, but from Tom Kyte. (He's 
an Oracle expert. I'm not entirely sure, but I believe he actually works 
for Oracle Corp.) The assertion being that "stuff that doesn't change" 
is rarer than you think it is.

But sure, especially if you're recording historical data (which is of 
course especially unlikely to change), there's no real need for a 
synthetic key.

> A sales receipt whose primary key is the cash register identifier and
> the time to the second when the receipt was printed is an excellent PK
> for a sales receipt.

Agreed.

> A road
> intersection has a fine PK of the GPS coordinates of the intersection.

Until the GPS coordinates turn out to have been inaccurate. GPS is only 
accurate to a few hundred yards of course, so different people will 
probably get different measurements depending on where they're standing. 
That's probably not a great idea. (Unless you explicitly round the 
coordinates up to some number that makes the size of the actual 
intersection negligable.)

Then again, most intersections probably already *have* a synthetic key. 
I don't know about your country, but around here major roads tend to 
have uniquely numbered markers on the roadside...

> The problem is that people are changing all the time, so it's very hard
> to have a customer record that persists with a PK that makes sense.

Quite.

> The drawback of this is if you get a reference to the wrong value,
> you'll never notice.

Well, perhaps. But if somebody gives you the wrong phone number, you'll 
"never notice" in the same sense...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*


Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 17:11:33
Message: <4d76a995$1@news.povray.org>
Tom Austin wrote:
> Interesting way of looking at building a schema - just break it down to 
> objects.  I think this fits our data very well.  I think we were doing 
> this anyway, but not on purpose - more by accident.

Trust me on this: It's the only way you ever get a schema you don't have to 
change with every software update. Record what is "out there", and to the 
extent sensible, never overwrite data. If you find yourself thinking "We'll 
need to keep copies", you've done something wrong, because what that really 
means is you need the state of something at two different times, but you 
didn't record the times.  Record *everything*, in as raw a format as you can.

I am looking at a financial system right now where I never overwrite data. 
When someone swipes a card, I put into a table exactly what the card machine 
gave me: The card machine ID, the credit card number, the amount the clerk 
typed in, and the timestamp I got the record.  I don't record the merchant 
ID in that table, or the ID of the cardholder. That all comes later, when I 
process the records.

Why? Because the merchant might change merchant accounts before telling us. 
The cardholder might lose the card. Heck, we might even accidentally create 
two cards with the same number.

I'm blathering. Nevermind.

Anyway, yes, record everything from *outside* your company in one set of 
tables, as *raw* as you can. Changes to these tables should be idempotent. 
(I.e., you *should* be able to read the CSV file the telco gave you listing 
the pole information multiple times into the same table without corruption. 
This implies no automatically-generated primary key on those tables. That 
said, again, this is just a guideline and not always possible.)

Have a second set of tables, completely created from the first set, that 
holds any calculated information that might be difficult to recalculate 
(e.g., running account balances in a financial app), but which can be 
checked against and recreated from the raw data.

Once you have stuff working, figure out where it's slow, *then* fix that. 
Usually "fix that" involves changing the declarations of some indexes, not 
rearranging the actual data (which, if you have data for which you can 
follow the rules above, you can't really rearrange anyway).

> this might be worth considering, but I don't know how easy it will be to 
> implement and keep straight.  Seems like the a user created key like 
> this could lead to duplicates - but if it is a key it will prevent 
> duplicates anyway - just prompt for another key.

Oh, yes, this isn't for user keys. I meant where in a programming language 
you'd use an enumerated type.

So if you have a "status" row for jobs that says "in negotiation", "paid for 
but waiting for someone to be assigned", "that person is working on it", and 
"ready to be delivered", you could use this sort of coding to make your 
programs more readable. "NEGO", "PAID", "WORK", "DONE" rather than 0, 1, 2, 3.

> Most of the data in question is a simple map from an integer key to a 
> piece of text.  I do have some queries on the text with joins to pull 
> data from other tables.
> 
> The text is not designed to be changed.
> 
> The alternative that we have is to flatten the relationship and put the 
> text directly in the table instead of having a key reference.
> 
> ie
> go from
> 
> tablePole        tableOwners
> Pole   OwnerID        ID    Name
> 
> 
> to
> 
> tablePole
> Pole    OwnerName
> 
> 
> 
> Any thoughts on this?

If OwnerName shows up in multiple places, then these bits would need to be 
in a PK/FK relationship to properly cascade changes.

I assume each owner has more than one pole. Hence, duplicating the name for 
every pole is inefficient space-wise and dangerous update-wise. (Especially 
if you don't use transactions.)

Is the "owner" a real independent entity? Is it rational to speculate there 
might be an owner with no poles? Or an owner who has poles but hasn't told 
you about them yet? What about an owner who just sold his last pole to pay 
for the pole he's going to buy next week?

So, yes, here, "owner" is an actual entity, not just some field about a 
pole. The name of the owner isn't associated with the pole at all.

Indeed, the *best* way to structure this would be

table Pole: GPS coords, height.  (PK: GPS)

table Owner: ID, contact name, contact phone, etc etc. (PK: ID)

table Ownership: PoleGPS, OwnerID, StartingFromTime. (PK: GPS+StartingFromTime).

table Cables: PoleGPS, cable height, fiber height, power height, 
StartingFromTime. (PK: GPS + StartingFromTime)

Later, you can go back and ask questions like "Did AT&T ever change the 
height of the power lines on the pole at Fifth and Main, or was that 
Cingular, after they bought that line from AT&T?"


*That* might be a bit too normalized for your purposes, but you'll never get 
in a situation where someone sells a bunch of poles, then comes to you and 
asks you to recreate a report from before they sold the poles or something.

See if you can look at the columns you have and come up with a breakdown 
like this into "things really happening out there". You might find yourself 
a natural at it. :-)

Now, the "groups" kind of thing I was talking about, the "secondary" set of 
tables, might be tables that take a group of poles owned by the same 
company, that constitute a single "cable run". Large stretches of cable run 
might have the cable at the same height, so your calculations could be 
faster if you had a table that put each pole in a group, and then had the 
height of the cable on the group specified, or something like that.

It's hard coming up with that sort of thing when I don't know how your 
business works, tho, but that's the approximate sort of thing I was talking 
about.


>>> We are developing a new database system to take care of some severe
>>> shortcomings of the old system.
>>
>> What are the shortcomings?
>>
> 
> Let's just say is was a single table database with about 250 columns - 
> limited only by the fact that Excel at one time only allowed about 250 
> columns.

Oh, so the shortcoming is that it was rushed and grew instead of being 
designed. Makes sense. :-)

> Putting data in and getting data out was very difficult and required the 
> sacrifice of too many brain cells.

That's exactly where SQL queries between multiple tables will help. 
Thinking in SQL instead of in terms of iterating over things is tough, but 
it's almost always possible to come up with a statement in SQL that will 
pull out all and only the data you need. If you ever get to the point where 
you're saying "for each result from *this* query, submit *that* query", 
you're almost 100% sure you're doing something wrong. In my entire career, I 
only ever did that once, and that was after I'd written it the right way and 
figured out MySql was not optimizing things in the obvious way and I didn't 
know how to make it do so.

>>> Our data has a lot of common 'choose from' items - such as the owner
>>> of the pole.
>>> The data consists of pole specific information that doesn't change
>>> (owner, height, etc) and information that will likely change (the
>>> height of cables on the pole).
>>
>> These last two should probably be in separate tables that you join at
>> runtime. (Especially if you're using MySql, where differnet "engines"
>> give you different capabilities.)
>>
> 
> That's what we are trying to figure out - how to make it make sense to 
> have them in separate tables.

Let me know if the description of tables above does or doesn't make sense.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 17:34:15
Message: <4d76aee7$1@news.povray.org>
Orchid XP v8 wrote:
> The assertion being that "stuff that doesn't change" 
> is rarer than you think it is.

I'll grant that it's rarer than most people think it is. The trick is to 
consider whether if the PK changes, you're willing to treat the object as a 
new entity, or whether it's rare enough that you're willing to expunge the 
old ID entirely.

Is it the same telephone pole if I dig it up and plant it somewhere else? 
Sure, maybe. But am I ever going to need to know that if my job involves 
routing cables?  Probably not.  Am I going to need to know that if my job 
involves replacing them when they wear out?  Most definitely.

How often do you replace the little tags on the side of the pole that give 
the pole number? Not very often. Is there any reason if that ID tag gets 
replaced that you can't just update in the database everyplace that appears? 
Probably not.

> But sure, especially if you're recording historical data (which is of 
> course especially unlikely to change), there's no real need for a 
> synthetic key.

I contend that most primary data in the database should be historical.

> Until the GPS coordinates turn out to have been inaccurate.

Then you fix it. There never *was* an intersection with those GPS 
coordinates. So you fix it. The point is to avoid collisions, not to 
necessarily have a PK that never changes.

Given that the BIGGEST TRANSACTIONAL DATABASE IN THE WORLD (*) identifies 
street intersections with GPS coordinates, I'm gonna have to go with GPS 
coordinates on this one. ;-)

> GPS is only 
> accurate to a few hundred yards of course,

Uh, no. What are you smoking?

> so different people will 
> probably get different measurements depending on where they're standing. 

Sure. But you're not asking someone to identify a real intersection based on 
their GPS coordinates. You're using it as a primary key.

None of your customers identifies themself with your internal ID number, either.

> Then again, most intersections probably already *have* a synthetic key. 
> I don't know about your country, but around here major roads tend to 
> have uniquely numbered markers on the roadside...

Assuming those don't change, they're probably based on geometry also. The 
point of using GPS coordinates is that they are guaranteed not to change. If 
you enter them in the database wrong, then you have bad data, but that's not 
because the intersection moved.

>> The drawback of this is if you get a reference to the wrong value,
>> you'll never notice.
> 
> Well, perhaps. But if somebody gives you the wrong phone number, you'll 
> "never notice" in the same sense...

Except I'm not using a phone number as a primary key. :-)



(*) The 300 terabyte TURKS database. The size is probably bigger 15 years 
later, too.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."


Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 17:49:04
Message: <4d76b260$1@news.povray.org>
On 3/8/2011 4:43 PM, Darren New wrote:
> Tom Austin wrote:
>> Having unoptimized programs makes it very slow. Programs written as
>> fast as possible do not benefit from being efficient.
>
> Modern database systems are designed to take normalized tables and make
> accessing them very fast. That's the fundamental operation a relational
> database management system performs.
>
> How big is your total collection of active databases right now? If it
> fits in (say) a gigabyte, chances are you can't possibly make it too slow.
>
> If it *is* too slow when done "right", *then* you get out the query
> analyzer and look to see what it's spending its time doing, and you add
> the one index that cuts that time from minutes to miliseconds.
>

I think most of our slow downs is currently in the applications that 
access it.  Basically inefficient routines and use of objects.  A 
required approach when coding time as of the essence.


>> Thanks for the acronym - more info to go on :-)
>>
>> I don't know if we will try or need to set up transactions, but they
>> could be helpful if something is not a simple insert or update.
>
> You will need transactions. Trust me on this one. :-) If nothing else,
> you'll need transactions when you're archiving out an old job.
>

I've worked with transactions some in the past and they make a lot of 
sense.  I'll have to try to incorporate them as we move forward.

Does it make sense to set up a transaction wrapper on even simple single 
updates?

> That said, it's surprisingly common that if you structure your queries
> right, you can make every operation essentially an "append to this
> table", followed by an eventual "update that table in a way that if it
> fails, we can reconstruct that table entirely from scratch". Think
> Journal -> General Ledger.
>

OK, I've been dealing with databases for 15 years now - on mostly a low 
level.  that just went right over my head.  Makes sense, but I have no 
concept of how it is implemented.

>
> That siad, btw, it's often the case that you *can* find text that won't
> change often enough to make a difference. Stock ticker, street name,
> radio station call letters. If it changes rarely enough that you're
> willing to treat it as an "delete all the old, insert all the new" sort
> of transaction, using text is OK. One generally doesn't do this with
> people, but saying "if you change your stock ticker, you have to treat
> it as selling all the old shares and getting the same number of new
> shares from the new ticker name" isn't too unreasonable.
>

for some reason I am hesitant to present the key as something that the 
end user interacts with.  Maybe I'm a control freak :-&


>> OK, you want a job surveying utility poles :-) You have good insight.
>
> I've just done this dozens of times in the last couple decades.
>
what - survey telephone poles?

>
> Look at it this way: This is the data that is vital to your company, in
> the strictest sense. Your company's job is to manage and evolve this
> data. You don't have time to get it wrong.


that is very correct.  Our first stab was fast and mostly dirty.  We 
spent a lot of time to try to get the data right, but we missed some 
pieces of data.

>
> What about your current system is so broken that you can't afford the
> time to fix it?
>

It has layers upon layers of fixes.  The end users had to fight with it 
to get it to put out data correctly.  it was an ad-hoc rich system to 
the point where I refused to make any more changes because of the risk 
of data corruption/loss and our client getting crap for a deliverable.

There were some cool features that I implemented - like a GIS type 
interface - before GIS systems were popular.


> The other possibility is to hire someone who actually does this sort of
> stuff for a living for long enough to show you how to set up the basics
> of the database.
>

It's a balance.  Depending on our requirements it could take just as 
much of our time working with the guy and doing it ourselves.

>>>> Exporting the data for archival purposes could be a good idea. Even
>>>> exporting a fully functional database with only the particular job
>>>> data required could be good.
>>>
>>> All those things are not-too-hard if all the data is in the database.
>>>
>>
>> Yes, and if the database schema is identical, the same programs can
>> use any database - from mySQL to .mdb files.
>
> That's the goal. As you get more sophisticated, you find that breaks
> down. When you use views in your "real" database and you archive it to
> something that has no views, you find yourself with a problem.
>

yes, one just has to pay attention and not set up things that break.
It could be that applications do not rely on the views (or other 
non-common features)  But then one could be crippling themselves.


> That said, even a cheap machine nowadays has more than enough processing
> power to do whole boatloads of database processing.
>
> About how much data do you have, in total and in active jobs? Many
> people seriously underestimate just how much processing even a $1000
> machine can do on a SQL database.
>

yes, I don't think it will take much of a machine.  We have an old 
server box (PIII 600 I think) that would likely do just that.  Or I 
might incorporate MySQL on our current file server (Linux) to simplify 
things and use the old box for development.


>> Well, almost free - just takes time to set up. By the time all is said
>> and done I figure that mySQL will take about $1k - $2k in my time to
>> set up reliability. Need to get it installed on our server and set up
>> connection methods and backup methods. Not too difficult, but I don't
>> do it every day.
>
> Be sure to factor in some ongoing maintenance time. Occasionally your
> replication will break, or something like that, and you'll have to fix
> stuff in a panic. :-)
>

yes, just like last week with out servers.

In the past 3 years I have had 1 or 2 incidents that required immediate 
attention.

The past week brought no less than 5 of them.

 From full OS drives to failed RAID mirrors.

Luckily none of them were damaging.  Just annoying and cost a couple of 
hours of everyone's time.


Post a reply to this message

<<< Previous 8 Messages Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.