POV-Ray: Newsgroups: povray.off-topic: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions		Server Time 14 Jul 2025 18:05:30 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 16:21:02
Message: <4d769dbe$1@news.povray.org>

On 3/8/2011 3:42 PM, Darren New wrote:
> Tom Austin wrote:
>> I know that we will want to refer back to the data - either to 'copy'
>> it for another job or to just look at historical statistics - like you
>> point out.
>
> Then keep it in one database.
>
>> I've run across some database engines that support it, but it seemed
>> not to be a pointed feature that is used often.
>
> Right. Because most people don't do things that way. There's no good
> reason to make things a separate database unless you want to do
> administrative things differently like having different access or store
> it on different machines.
>

thanks, point made :-)


>> Our data structure is not too large in comparison to most databases.
>> Currently we have about 30 tables with maybe 5-10 more required if we
>> keep on the current design path.
>
> There's no need to denormalize, then. It also doesn't sound like you're
> doing anything highly transactional, so a quarter-second transaction is
> unlikely to bother anyone using it.
>

Having unoptimized programs makes it very slow.  Programs written as 
fast as possible do not benefit from being efficient.

But I think with optimization of how to access the data it can be made 
very quick.  Also, moving away from a Accedd .mdb file will speed things 
up considerably.


>> With any but the most simple database you will need automated systems
>> to access and keep the data straight. You have to decide where you are
>> going to put the work.
>
> The point of ACID (at least the C part) is that you declare the "keep
> the data straight" part in the database as data. Then a mistake in your
> application cannot corrupt your data in that way.
>

Thanks for the acronym - more info to go on :-)

I don't know if we will try or need to set up transactions, but they 
could be helpful if something is not a simple insert or update.

>> I guess the question is more:
>> It is good or bad practice to use text as the 'key' rather than
>> dereferencing to an integer type?
>
> The key is an identifier that won't change. What text do you have for
> the entity that won't change? The company name? The road the pole is on?
> The name of the person you should talk to about it? All these things can
> change.

point made - thanks :-)


>> Something you said here might work for us.
>> The poles that we collect do not change and can be reused for other
>> jobs - so keeping them around could be good.
>> The cable heights are more job specific as when the job is built the
>> height are now wrong.
>
> You're doing it wrong. The cable heights don't change. Instead, you have
> a range of dates when the cable was *this* high, and another range of
> dates when the cable was *that* high.
>

OK, you want a job surveying utility poles :-)  You have good insight.


> If you overwrite that information, then the customer later comes back
> and asks about it, now you have a problem. Instead, the primary key for
> the cable should be "company id, pole number, first date valid". Then
> your queries can find the height of the cable on any particular day by
> finding the latest date where that company's pole is listed.
>

At this point I am not used to setting up primary keys with multiple 
columns - but this is making some sense.  The question we have is if we 
have time to implement some such beast :-)


>> Exporting the data for archival purposes could be a good idea. Even
>> exporting a fully functional database with only the particular job
>> data required could be good.
>
> All those things are not-too-hard if all the data is in the database.
>

Yes, and if the database schema is identical, the same programs can use 
any database - from mySQL to .mdb files.


>> We are a small show so we cannot afford a ton of overhead to keep a
>> system running, yet we also do not want to set up a $20,000 db system
>> that is overkill for what we do.
>
> Fortunately, since free RDBMs have come out, the price of a database
> engine has dropped from $100K/user to free. ;-)
>

Well, almost free - just takes time to set up.  By the time all is said 
and done I figure that mySQL will take about $1k - $2k in my time to set 
up reliability.  Need to get it installed on our server and set up 
connection methods and backup methods.  Not too difficult, but I don't 
do it every day.

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 16:24:17
Message: <4d769e81$1@news.povray.org>

Orchid XP v8 wrote:
> Pro-tip: Don't use real-world things as primary keys. Because you know 
> what? Real-world things have this nasty habit of *changing*. You don't 
> ever want your primary key to change. This is why databases up and down 
> the country refer to Mr Smith not as Mr Smith but as customer #238523.

I disagree. The only time you should use automatically-generated primary 
keys is when there is no attribute of the item in the table that is 
guaranteed not to change.

A sales receipt whose primary key is the cash register identifier and the 
time to the second when the receipt was printed is an excellent PK for a 
sales receipt. A record recording what song was playing on the radio has the 
obvious PK of location+timestamp+frequency. A road intersection has a fine 
PK of the GPS coordinates of the intersection.

The problem is that people are changing all the time, so it's very hard to 
have a customer record that persists with a PK that makes sense.

> Use randomly-generated integers with no real-world meaning as PKs and 
> you'll have fewer problems if/when the thing the key refers to changes 
> in some way.

The drawback of this is if you get a reference to the wrong value, you'll 
never notice.

> I would suggest not setting up an Oracle database, for example. It 
> sounds like it would be utter overkill here.

Yes. One of the free databases (Postgresql or mysql) or (on windows) SQL 
Server (Express, perhaps) would do fine.  You'd probably find the most free 
help with MySql, but it's kind of messy to run on Windows, or at least was 
last I checked. SQL Server Express would *probably* have enough performance 
for you, given you were using Access.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 16:31:22
Message: <4d76a02a$1@news.povray.org>

Tom Austin wrote:
> Yes, the data needs to be well understood before starting the schema. 
> There should be no anomalies to 'fit in' after the fact.

The absolute best way to make this work out, then, is to record "reality" in 
the database. Ignore how you're going to use the data while you're designing 
what's in the database (to the extent possible). Instead, simply record facts.

Of course you need to know the height of the poles and their locations and 
such, because that's what you do your calculations with.

What I'm trying to say is don't say "our calculations would be easier if our 
data was arranged like *this*."  If you stick with the reality of what's out 
there, you will never run into the situation where you have to *change* data 
you already have in the database.

I know this sounds kind of vague, and it can be hard to get away from it if 
you already have a model of something up and running and you know how that 
works.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 16:43:44
Message: <4d76a310$1@news.povray.org>

Tom Austin wrote:
> Having unoptimized programs makes it very slow.  Programs written as 
> fast as possible do not benefit from being efficient.

Modern database systems are designed to take normalized tables and make 
accessing them very fast. That's the fundamental operation a relational 
database management system performs.

How big is your total collection of active databases right now? If it fits 
in (say) a gigabyte, chances are you can't possibly make it too slow.

If it *is* too slow when done "right", *then* you get out the query analyzer 
and look to see what it's spending its time doing, and you add the one index 
that cuts that time from minutes to miliseconds.

> Thanks for the acronym - more info to go on :-)
> 
> I don't know if we will try or need to set up transactions, but they 
> could be helpful if something is not a simple insert or update.

You will need transactions. Trust me on this one. :-)  If nothing else, 
you'll need transactions when you're archiving out an old job.

That said, it's surprisingly common that if you structure your queries 
right, you can make every operation essentially an "append to this table", 
followed by an eventual "update that table in a way that if it fails, we can 
reconstruct that table entirely from scratch".   Think Journal -> General 
Ledger.

>> The key is an identifier that won't change. What text do you have for
>> the entity that won't change? The company name? The road the pole is on?
>> The name of the person you should talk to about it? All these things can
>> change.
> 
> point made - thanks :-)

That siad, btw, it's often the case that you *can* find text that won't 
change often enough to make a difference. Stock ticker, street name, radio 
station call letters. If it changes rarely enough that you're willing to 
treat it as an "delete all the old, insert all the new" sort of transaction, 
using text is OK. One generally doesn't do this with people, but saying "if 
you change your stock ticker, you have to treat it as selling all the old 
shares and getting the same number of new shares from the new ticker name" 
isn't too unreasonable.

> OK, you want a job surveying utility poles :-)  You have good insight.

I've just done this dozens of times in the last couple decades.

> At this point I am not used to setting up primary keys with multiple 
> columns - but this is making some sense.  The question we have is if we 
> have time to implement some such beast :-)

Look at it this way: This is the data that is vital to your company, in the 
strictest sense. Your company's job is to manage and evolve this data. You 
don't have time to get it wrong.

What about your current system is so broken that you can't afford the time 
to fix it?

The other possibility is to hire someone who actually does this sort of 
stuff for a living for long enough to show you how to set up the basics of 
the database.

>>> Exporting the data for archival purposes could be a good idea. Even
>>> exporting a fully functional database with only the particular job
>>> data required could be good.
>>
>> All those things are not-too-hard if all the data is in the database.
>>
> 
> Yes, and if the database schema is identical, the same programs can use 
> any database - from mySQL to .mdb files.

That's the goal. As you get more sophisticated, you find that breaks down. 
When you use views in your "real" database and you archive it to something 
that has no views, you find yourself with a problem.

That said, even a cheap machine nowadays has more than enough processing 
power to do whole boatloads of database processing.

About how much data do you have, in total and in active jobs?  Many people 
seriously underestimate just how much processing even a $1000 machine can do 
on a SQL database.

> Well, almost free - just takes time to set up.  By the time all is said 
> and done I figure that mySQL will take about $1k - $2k in my time to set 
> up reliability.  Need to get it installed on our server and set up 
> connection methods and backup methods.  Not too difficult, but I don't 
> do it every day.

Be sure to factor in some ongoing maintenance time. Occasionally your 
replication will break, or something like that, and you'll have to fix stuff 
in a panic. :-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Orchid XP v8
Subject: Re: Database Questions
Date: 8 Mar 2011 17:03:07
Message: <4d76a79b$1@news.povray.org>

On 08/03/2011 09:24 PM, Darren New wrote:
> Orchid XP v8 wrote:
>> Pro-tip: Don't use real-world things as primary keys. Because you know
>> what? Real-world things have this nasty habit of *changing*. You don't
>> ever want your primary key to change. This is why databases up and
>> down the country refer to Mr Smith not as Mr Smith but as customer
>> #238523.
>
> I disagree. The only time you should use automatically-generated primary
> keys is when there is no attribute of the item in the table that is
> guaranteed not to change.

The recommendation comes not so much from me, but from Tom Kyte. (He's 
an Oracle expert. I'm not entirely sure, but I believe he actually works 
for Oracle Corp.) The assertion being that "stuff that doesn't change" 
is rarer than you think it is.

But sure, especially if you're recording historical data (which is of 
course especially unlikely to change), there's no real need for a 
synthetic key.

> A sales receipt whose primary key is the cash register identifier and
> the time to the second when the receipt was printed is an excellent PK
> for a sales receipt.

Agreed.

> A road
> intersection has a fine PK of the GPS coordinates of the intersection.

Until the GPS coordinates turn out to have been inaccurate. GPS is only 
accurate to a few hundred yards of course, so different people will 
probably get different measurements depending on where they're standing. 
That's probably not a great idea. (Unless you explicitly round the 
coordinates up to some number that makes the size of the actual 
intersection negligable.)

Then again, most intersections probably already *have* a synthetic key. 
I don't know about your country, but around here major roads tend to 
have uniquely numbered markers on the roadside...

> The problem is that people are changing all the time, so it's very hard
> to have a customer record that persists with a PK that makes sense.

Quite.

> The drawback of this is if you get a reference to the wrong value,
> you'll never notice.

Well, perhaps. But if somebody gives you the wrong phone number, you'll 
"never notice" in the same sense...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 17:11:33
Message: <4d76a995$1@news.povray.org>

Tom Austin wrote:
> Interesting way of looking at building a schema - just break it down to 
> objects.  I think this fits our data very well.  I think we were doing 
> this anyway, but not on purpose - more by accident.

Trust me on this: It's the only way you ever get a schema you don't have to 
change with every software update. Record what is "out there", and to the 
extent sensible, never overwrite data. If you find yourself thinking "We'll 
need to keep copies", you've done something wrong, because what that really 
means is you need the state of something at two different times, but you 
didn't record the times.  Record *everything*, in as raw a format as you can.

I am looking at a financial system right now where I never overwrite data. 
When someone swipes a card, I put into a table exactly what the card machine 
gave me: The card machine ID, the credit card number, the amount the clerk 
typed in, and the timestamp I got the record.  I don't record the merchant 
ID in that table, or the ID of the cardholder. That all comes later, when I 
process the records.

Why? Because the merchant might change merchant accounts before telling us. 
The cardholder might lose the card. Heck, we might even accidentally create 
two cards with the same number.

I'm blathering. Nevermind.

Anyway, yes, record everything from *outside* your company in one set of 
tables, as *raw* as you can. Changes to these tables should be idempotent. 
(I.e., you *should* be able to read the CSV file the telco gave you listing 
the pole information multiple times into the same table without corruption. 
This implies no automatically-generated primary key on those tables. That 
said, again, this is just a guideline and not always possible.)

Have a second set of tables, completely created from the first set, that 
holds any calculated information that might be difficult to recalculate 
(e.g., running account balances in a financial app), but which can be 
checked against and recreated from the raw data.

Once you have stuff working, figure out where it's slow, *then* fix that. 
Usually "fix that" involves changing the declarations of some indexes, not 
rearranging the actual data (which, if you have data for which you can 
follow the rules above, you can't really rearrange anyway).

> this might be worth considering, but I don't know how easy it will be to 
> implement and keep straight.  Seems like the a user created key like 
> this could lead to duplicates - but if it is a key it will prevent 
> duplicates anyway - just prompt for another key.

Oh, yes, this isn't for user keys. I meant where in a programming language 
you'd use an enumerated type.

So if you have a "status" row for jobs that says "in negotiation", "paid for 
but waiting for someone to be assigned", "that person is working on it", and 
"ready to be delivered", you could use this sort of coding to make your 
programs more readable. "NEGO", "PAID", "WORK", "DONE" rather than 0, 1, 2, 3.

> Most of the data in question is a simple map from an integer key to a 
> piece of text.  I do have some queries on the text with joins to pull 
> data from other tables.
> 
> The text is not designed to be changed.
> 
> The alternative that we have is to flatten the relationship and put the 
> text directly in the table instead of having a key reference.
> 
> ie
> go from
> 
> tablePole        tableOwners
> Pole   OwnerID        ID    Name
> 
> 
> to
> 
> tablePole
> Pole    OwnerName
> 
> 
> 
> Any thoughts on this?

If OwnerName shows up in multiple places, then these bits would need to be 
in a PK/FK relationship to properly cascade changes.

I assume each owner has more than one pole. Hence, duplicating the name for 
every pole is inefficient space-wise and dangerous update-wise. (Especially 
if you don't use transactions.)

Is the "owner" a real independent entity? Is it rational to speculate there 
might be an owner with no poles? Or an owner who has poles but hasn't told 
you about them yet? What about an owner who just sold his last pole to pay 
for the pole he's going to buy next week?

So, yes, here, "owner" is an actual entity, not just some field about a 
pole. The name of the owner isn't associated with the pole at all.

Indeed, the *best* way to structure this would be

table Pole: GPS coords, height.  (PK: GPS)

table Owner: ID, contact name, contact phone, etc etc. (PK: ID)

table Ownership: PoleGPS, OwnerID, StartingFromTime. (PK: GPS+StartingFromTime).

table Cables: PoleGPS, cable height, fiber height, power height, 
StartingFromTime. (PK: GPS + StartingFromTime)

Later, you can go back and ask questions like "Did AT&T ever change the 
height of the power lines on the pole at Fifth and Main, or was that 
Cingular, after they bought that line from AT&T?"

*That* might be a bit too normalized for your purposes, but you'll never get 
in a situation where someone sells a bunch of poles, then comes to you and 
asks you to recreate a report from before they sold the poles or something.

See if you can look at the columns you have and come up with a breakdown 
like this into "things really happening out there". You might find yourself 
a natural at it. :-)

Now, the "groups" kind of thing I was talking about, the "secondary" set of 
tables, might be tables that take a group of poles owned by the same 
company, that constitute a single "cable run". Large stretches of cable run 
might have the cable at the same height, so your calculations could be 
faster if you had a table that put each pole in a group, and then had the 
height of the cable on the group specified, or something like that.

It's hard coming up with that sort of thing when I don't know how your 
business works, tho, but that's the approximate sort of thing I was talking 
about.

>>> We are developing a new database system to take care of some severe
>>> shortcomings of the old system.
>>
>> What are the shortcomings?
>>
> 
> Let's just say is was a single table database with about 250 columns - 
> limited only by the fact that Excel at one time only allowed about 250 
> columns.

Oh, so the shortcoming is that it was rushed and grew instead of being 
designed. Makes sense. :-)

> Putting data in and getting data out was very difficult and required the 
> sacrifice of too many brain cells.

That's exactly where SQL queries between multiple tables will help. 
Thinking in SQL instead of in terms of iterating over things is tough, but 
it's almost always possible to come up with a statement in SQL that will 
pull out all and only the data you need. If you ever get to the point where 
you're saying "for each result from *this* query, submit *that* query", 
you're almost 100% sure you're doing something wrong. In my entire career, I 
only ever did that once, and that was after I'd written it the right way and 
figured out MySql was not optimizing things in the obvious way and I didn't 
know how to make it do so.

>>> Our data has a lot of common 'choose from' items - such as the owner
>>> of the pole.
>>> The data consists of pole specific information that doesn't change
>>> (owner, height, etc) and information that will likely change (the
>>> height of cables on the pole).
>>
>> These last two should probably be in separate tables that you join at
>> runtime. (Especially if you're using MySql, where differnet "engines"
>> give you different capabilities.)
>>
> 
> That's what we are trying to figure out - how to make it make sense to 
> have them in separate tables.

Let me know if the description of tables above does or doesn't make sense.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 17:34:15
Message: <4d76aee7$1@news.povray.org>

Orchid XP v8 wrote:
> The assertion being that "stuff that doesn't change" 
> is rarer than you think it is.

I'll grant that it's rarer than most people think it is. The trick is to 
consider whether if the PK changes, you're willing to treat the object as a 
new entity, or whether it's rare enough that you're willing to expunge the 
old ID entirely.

Is it the same telephone pole if I dig it up and plant it somewhere else? 
Sure, maybe. But am I ever going to need to know that if my job involves 
routing cables?  Probably not.  Am I going to need to know that if my job 
involves replacing them when they wear out?  Most definitely.

How often do you replace the little tags on the side of the pole that give 
the pole number? Not very often. Is there any reason if that ID tag gets 
replaced that you can't just update in the database everyplace that appears? 
Probably not.

> But sure, especially if you're recording historical data (which is of 
> course especially unlikely to change), there's no real need for a 
> synthetic key.

I contend that most primary data in the database should be historical.

> Until the GPS coordinates turn out to have been inaccurate.

Then you fix it. There never *was* an intersection with those GPS 
coordinates. So you fix it. The point is to avoid collisions, not to 
necessarily have a PK that never changes.

Given that the BIGGEST TRANSACTIONAL DATABASE IN THE WORLD (*) identifies 
street intersections with GPS coordinates, I'm gonna have to go with GPS 
coordinates on this one. ;-)

> GPS is only 
> accurate to a few hundred yards of course,

Uh, no. What are you smoking?

> so different people will 
> probably get different measurements depending on where they're standing. 

Sure. But you're not asking someone to identify a real intersection based on 
their GPS coordinates. You're using it as a primary key.

None of your customers identifies themself with your internal ID number, either.

> Then again, most intersections probably already *have* a synthetic key. 
> I don't know about your country, but around here major roads tend to 
> have uniquely numbered markers on the roadside...

Assuming those don't change, they're probably based on geometry also. The 
point of using GPS coordinates is that they are guaranteed not to change. If 
you enter them in the database wrong, then you have bad data, but that's not 
because the intersection moved.

>> The drawback of this is if you get a reference to the wrong value,
>> you'll never notice.
> 
> Well, perhaps. But if somebody gives you the wrong phone number, you'll 
> "never notice" in the same sense...

Except I'm not using a phone number as a primary key. :-)

(*) The 300 terabyte TURKS database. The size is probably bigger 15 years 
later, too.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 8 Mar 2011 17:49:04
Message: <4d76b260$1@news.povray.org>

On 3/8/2011 4:43 PM, Darren New wrote:
> Tom Austin wrote:
>> Having unoptimized programs makes it very slow. Programs written as
>> fast as possible do not benefit from being efficient.
>
> Modern database systems are designed to take normalized tables and make
> accessing them very fast. That's the fundamental operation a relational
> database management system performs.
>
> How big is your total collection of active databases right now? If it
> fits in (say) a gigabyte, chances are you can't possibly make it too slow.
>
> If it *is* too slow when done "right", *then* you get out the query
> analyzer and look to see what it's spending its time doing, and you add
> the one index that cuts that time from minutes to miliseconds.
>

I think most of our slow downs is currently in the applications that 
access it.  Basically inefficient routines and use of objects.  A 
required approach when coding time as of the essence.

>> Thanks for the acronym - more info to go on :-)
>>
>> I don't know if we will try or need to set up transactions, but they
>> could be helpful if something is not a simple insert or update.
>
> You will need transactions. Trust me on this one. :-) If nothing else,
> you'll need transactions when you're archiving out an old job.
>

I've worked with transactions some in the past and they make a lot of 
sense.  I'll have to try to incorporate them as we move forward.

Does it make sense to set up a transaction wrapper on even simple single 
updates?

> That said, it's surprisingly common that if you structure your queries
> right, you can make every operation essentially an "append to this
> table", followed by an eventual "update that table in a way that if it
> fails, we can reconstruct that table entirely from scratch". Think
> Journal -> General Ledger.
>

OK, I've been dealing with databases for 15 years now - on mostly a low 
level.  that just went right over my head.  Makes sense, but I have no 
concept of how it is implemented.

>
> That siad, btw, it's often the case that you *can* find text that won't
> change often enough to make a difference. Stock ticker, street name,
> radio station call letters. If it changes rarely enough that you're
> willing to treat it as an "delete all the old, insert all the new" sort
> of transaction, using text is OK. One generally doesn't do this with
> people, but saying "if you change your stock ticker, you have to treat
> it as selling all the old shares and getting the same number of new
> shares from the new ticker name" isn't too unreasonable.
>

for some reason I am hesitant to present the key as something that the 
end user interacts with.  Maybe I'm a control freak :-&

>> OK, you want a job surveying utility poles :-) You have good insight.
>
> I've just done this dozens of times in the last couple decades.
>
what - survey telephone poles?

>
> Look at it this way: This is the data that is vital to your company, in
> the strictest sense. Your company's job is to manage and evolve this
> data. You don't have time to get it wrong.

that is very correct.  Our first stab was fast and mostly dirty.  We 
spent a lot of time to try to get the data right, but we missed some 
pieces of data.

>
> What about your current system is so broken that you can't afford the
> time to fix it?
>

It has layers upon layers of fixes.  The end users had to fight with it 
to get it to put out data correctly.  it was an ad-hoc rich system to 
the point where I refused to make any more changes because of the risk 
of data corruption/loss and our client getting crap for a deliverable.

There were some cool features that I implemented - like a GIS type 
interface - before GIS systems were popular.

> The other possibility is to hire someone who actually does this sort of
> stuff for a living for long enough to show you how to set up the basics
> of the database.
>

It's a balance.  Depending on our requirements it could take just as 
much of our time working with the guy and doing it ourselves.

>>>> Exporting the data for archival purposes could be a good idea. Even
>>>> exporting a fully functional database with only the particular job
>>>> data required could be good.
>>>
>>> All those things are not-too-hard if all the data is in the database.
>>>
>>
>> Yes, and if the database schema is identical, the same programs can
>> use any database - from mySQL to .mdb files.
>
> That's the goal. As you get more sophisticated, you find that breaks
> down. When you use views in your "real" database and you archive it to
> something that has no views, you find yourself with a problem.
>

yes, one just has to pay attention and not set up things that break.
It could be that applications do not rely on the views (or other 
non-common features)  But then one could be crippling themselves.

> That said, even a cheap machine nowadays has more than enough processing
> power to do whole boatloads of database processing.
>
> About how much data do you have, in total and in active jobs? Many
> people seriously underestimate just how much processing even a $1000
> machine can do on a SQL database.
>

yes, I don't think it will take much of a machine.  We have an old 
server box (PIII 600 I think) that would likely do just that.  Or I 
might incorporate MySQL on our current file server (Linux) to simplify 
things and use the old box for development.

>> Well, almost free - just takes time to set up. By the time all is said
>> and done I figure that mySQL will take about $1k - $2k in my time to
>> set up reliability. Need to get it installed on our server and set up
>> connection methods and backup methods. Not too difficult, but I don't
>> do it every day.
>
> Be sure to factor in some ongoing maintenance time. Occasionally your
> replication will break, or something like that, and you'll have to fix
> stuff in a panic. :-)
>

yes, just like last week with out servers.

In the past 3 years I have had 1 or 2 incidents that required immediate 
attention.

The past week brought no less than 5 of them.

 From full OS drives to failed RAID mirrors.

Luckily none of them were damaging.  Just annoying and cost a couple of 
hours of everyone's time.

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 8 Mar 2011 18:22:50
Message: <4d76ba4a$1@news.povray.org>

Tom Austin wrote:
> I think most of our slow downs is currently in the applications that 
> access it.  Basically inefficient routines and use of objects.  A 
> required approach when coding time as of the essence.

Yeah, but now you hve some luxury to do it right, I'd wager, since you 
already have a working system.

> Does it make sense to set up a transaction wrapper on even simple single 
> updates?

If you're modifying more than one row at a time, yes. In most database 
engines, each statement you submit becomes its own transaction. This isn't 
true on all types of MySql tables. For example, if you have a MyISAM table 
and you say "update blah set x = 50 where y = 3" and it gets interrupted 
half way, you might update only half the rows where y = 3.

A single append of a single row is atomic. You're not going to corrupt the 
database per se. Just the consistency of the data inside the database.

So, yes, there's really so little overhead in using a transaction around 
anything that affects more than one row that it makes no sense not to do so.

>> That said, it's surprisingly common that if you structure your queries
>> right, you can make every operation essentially an "append to this
>> table", followed by an eventual "update that table in a way that if it
>> fails, we can reconstruct that table entirely from scratch". Think
>> Journal -> General Ledger.
>>
> 
> OK, I've been dealing with databases for 15 years now - on mostly a low 
> level.  that just went right over my head.  Makes sense, but I have no 
> concept of how it is implemented.

I've usually done financial databases. So, basically, all the history of 
everything in and out of the database is recorded in one set of tables, 
while the current state is in a second set of tables.

Like your checkbook: Every check you write gets a row appended, and you 
never change that row except to set a flag that you never unset (the "this 
check cleared" flag).  But there's another table that lists each account and 
says how much money is in it. In theory, you can construct the latter from 
the former, if you keep all your old checkbook registers around.

> for some reason I am hesitant to present the key as something that the 
> end user interacts with.  Maybe I'm a control freak :-&

No, there's almost never a reason to present the key that way. That doesn't 
make it less useful as a key. :-)

>>> OK, you want a job surveying utility poles :-) You have good insight.
>> I've just done this dozens of times in the last couple decades.
> what - survey telephone poles?

No, I only did that once. :-) Design databases from scratch.

> that is very correct.  Our first stab was fast and mostly dirty.  We 
> spent a lot of time to try to get the data right, but we missed some 
> pieces of data.

That's not unusual. The trick is to have the database structured in a way 
you can add the missed data later without *changing* what's already there. 
The best way to do that is to primarily record into the database exactly 
what's outside the database, reflecting reality as closely as possible.

>> What about your current system is so broken that you can't afford the
>> time to fix it?
>>
> 
> It has layers upon layers of fixes.  The end users had to fight with it 
> to get it to put out data correctly.  it was an ad-hoc rich system to 
> the point where I refused to make any more changes because of the risk 
> of data corruption/loss and our client getting crap for a deliverable.

Been there done that. Along with "before I start, has this ever been backed 
up?" Answered by "No, it has never been reliable enough we could get a 
backup of the data."  Followed by "Thanks, but I like my current job enough."

>> The other possibility is to hire someone who actually does this sort of
>> stuff for a living for long enough to show you how to set up the basics
>> of the database.
>>
> 
> It's a balance.  Depending on our requirements it could take just as 
> much of our time working with the guy and doing it ourselves.

This is true. I don't think it would take any less time, but you'd probably 
have a more robust system when you were done.

> yes, one just has to pay attention and not set up things that break.
> It could be that applications do not rely on the views (or other 
> non-common features)  But then one could be crippling themselves.

Yes. It's better to just say "if we need multiple archives of this database, 
we'll use the same engine."

> yes, I don't think it will take much of a machine.  We have an old 
> server box (PIII 600 I think) that would likely do just that.  Or I 
> might incorporate MySQL on our current file server (Linux) to simplify 
> things and use the old box for development.

Yeah. And you'll likely want a replicated server as well.

> Luckily none of them were damaging.  Just annoying and cost a couple of 
> hours of everyone's time.

Yeah, that's much better than the place catching on fire. :-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Invisible
Subject: Re: Database Questions
Date: 9 Mar 2011 04:29:34
Message: <4d77487e$1@news.povray.org>

On 08/03/2011 10:34 PM, Darren New wrote:
> Orchid XP v8 wrote:
>> The assertion being that "stuff that doesn't change" is rarer than you
>> think it is.
>
> I'll grant that it's rarer than most people think it is. The trick is to
> consider whether if the PK changes, you're willing to treat the object
> as a new entity, or whether it's rare enough that you're willing to
> expunge the old ID entirely.

Seems reasonable. And I'm guessing that Mr Kyte was talking about huge 
multi-TB data warehouses where updating or removing all references to 
something would be a major PITA.

> I contend that most primary data in the database should be historical.

This also seems like a sensible way to go.

>> Until the GPS coordinates turn out to have been inaccurate.
>
> Then you fix it. There never *was* an intersection with those GPS
> coordinates. So you fix it. The point is to avoid collisions, not to
> necessarily have a PK that never changes.

The point is *totally* to have a PK that never changes, since changing a 
PK is a tricky business which you want to avoid. (Duplicate PKs would of 
course be much, much worse, but that's fairly easy to avoid.)

> Given that the BIGGEST TRANSACTIONAL DATABASE IN THE WORLD (*)
> identifies street intersections with GPS coordinates, I'm gonna have to
> go with GPS coordinates on this one. ;-)

OK, well I'm gonna have to say they must be doing something more precise 
than "stand in the general vicinity of the intersection, take down the 
GPS coordinates, and we'll just use those".

>> GPS is only accurate to a few hundred yards of course,
>
> Uh, no. What are you smoking?

Hey, the satellites are 20,000 km up in the sky. It's astonishing that 
GPS can even figure out which city your in, never mind fixing your 
location to a few hundred yards.

>> so different people will probably get different measurements depending
>> on where they're standing.
>
> Sure. But you're not asking someone to identify a real intersection
> based on their GPS coordinates. You're using it as a primary key.

So, what, you pick a GPS coordinate at random that's approximately in 
the right area? (So long as its unambiguously not near any *other* 
intersection.)

>> Then again, most intersections probably already *have* a synthetic
>> key. I don't know about your country, but around here major roads tend
>> to have uniquely numbered markers on the roadside...
>
> Assuming those don't change, they're probably based on geometry also.

They seem to just be numbered sequentially along the length of the road.

> The point of using GPS coordinates is that they are guaranteed not to
> change.

Sure. Unless the GPS coordinate system moves, which can't possibly 
happen. Oh, wait...

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>