 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 3/8/2011 3:42 PM, Darren New wrote:
> Tom Austin wrote:
>> I know that we will want to refer back to the data - either to 'copy'
>> it for another job or to just look at historical statistics - like you
>> point out.
>
> Then keep it in one database.
>
>> I've run across some database engines that support it, but it seemed
>> not to be a pointed feature that is used often.
>
> Right. Because most people don't do things that way. There's no good
> reason to make things a separate database unless you want to do
> administrative things differently like having different access or store
> it on different machines.
>
thanks, point made :-)
>> Our data structure is not too large in comparison to most databases.
>> Currently we have about 30 tables with maybe 5-10 more required if we
>> keep on the current design path.
>
> There's no need to denormalize, then. It also doesn't sound like you're
> doing anything highly transactional, so a quarter-second transaction is
> unlikely to bother anyone using it.
>
Having unoptimized programs makes it very slow. Programs written as
fast as possible do not benefit from being efficient.
But I think with optimization of how to access the data it can be made
very quick. Also, moving away from a Accedd .mdb file will speed things
up considerably.
>> With any but the most simple database you will need automated systems
>> to access and keep the data straight. You have to decide where you are
>> going to put the work.
>
> The point of ACID (at least the C part) is that you declare the "keep
> the data straight" part in the database as data. Then a mistake in your
> application cannot corrupt your data in that way.
>
Thanks for the acronym - more info to go on :-)
I don't know if we will try or need to set up transactions, but they
could be helpful if something is not a simple insert or update.
>> I guess the question is more:
>> It is good or bad practice to use text as the 'key' rather than
>> dereferencing to an integer type?
>
> The key is an identifier that won't change. What text do you have for
> the entity that won't change? The company name? The road the pole is on?
> The name of the person you should talk to about it? All these things can
> change.
point made - thanks :-)
>> Something you said here might work for us.
>> The poles that we collect do not change and can be reused for other
>> jobs - so keeping them around could be good.
>> The cable heights are more job specific as when the job is built the
>> height are now wrong.
>
> You're doing it wrong. The cable heights don't change. Instead, you have
> a range of dates when the cable was *this* high, and another range of
> dates when the cable was *that* high.
>
OK, you want a job surveying utility poles :-) You have good insight.
> If you overwrite that information, then the customer later comes back
> and asks about it, now you have a problem. Instead, the primary key for
> the cable should be "company id, pole number, first date valid". Then
> your queries can find the height of the cable on any particular day by
> finding the latest date where that company's pole is listed.
>
At this point I am not used to setting up primary keys with multiple
columns - but this is making some sense. The question we have is if we
have time to implement some such beast :-)
>> Exporting the data for archival purposes could be a good idea. Even
>> exporting a fully functional database with only the particular job
>> data required could be good.
>
> All those things are not-too-hard if all the data is in the database.
>
Yes, and if the database schema is identical, the same programs can use
any database - from mySQL to .mdb files.
>> We are a small show so we cannot afford a ton of overhead to keep a
>> system running, yet we also do not want to set up a $20,000 db system
>> that is overkill for what we do.
>
> Fortunately, since free RDBMs have come out, the price of a database
> engine has dropped from $100K/user to free. ;-)
>
Well, almost free - just takes time to set up. By the time all is said
and done I figure that mySQL will take about $1k - $2k in my time to set
up reliability. Need to get it installed on our server and set up
connection methods and backup methods. Not too difficult, but I don't
do it every day.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
> Pro-tip: Don't use real-world things as primary keys. Because you know
> what? Real-world things have this nasty habit of *changing*. You don't
> ever want your primary key to change. This is why databases up and down
> the country refer to Mr Smith not as Mr Smith but as customer #238523.
I disagree. The only time you should use automatically-generated primary
keys is when there is no attribute of the item in the table that is
guaranteed not to change.
A sales receipt whose primary key is the cash register identifier and the
time to the second when the receipt was printed is an excellent PK for a
sales receipt. A record recording what song was playing on the radio has the
obvious PK of location+timestamp+frequency. A road intersection has a fine
PK of the GPS coordinates of the intersection.
The problem is that people are changing all the time, so it's very hard to
have a customer record that persists with a PK that makes sense.
> Use randomly-generated integers with no real-world meaning as PKs and
> you'll have fewer problems if/when the thing the key refers to changes
> in some way.
The drawback of this is if you get a reference to the wrong value, you'll
never notice.
> I would suggest not setting up an Oracle database, for example. It
> sounds like it would be utter overkill here.
Yes. One of the free databases (Postgresql or mysql) or (on windows) SQL
Server (Express, perhaps) would do fine. You'd probably find the most free
help with MySql, but it's kind of messy to run on Windows, or at least was
last I checked. SQL Server Express would *probably* have enough performance
for you, given you were using Access.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> Yes, the data needs to be well understood before starting the schema.
> There should be no anomalies to 'fit in' after the fact.
The absolute best way to make this work out, then, is to record "reality" in
the database. Ignore how you're going to use the data while you're designing
what's in the database (to the extent possible). Instead, simply record facts.
Of course you need to know the height of the poles and their locations and
such, because that's what you do your calculations with.
What I'm trying to say is don't say "our calculations would be easier if our
data was arranged like *this*." If you stick with the reality of what's out
there, you will never run into the situation where you have to *change* data
you already have in the database.
I know this sounds kind of vague, and it can be hard to get away from it if
you already have a model of something up and running and you know how that
works.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> Having unoptimized programs makes it very slow. Programs written as
> fast as possible do not benefit from being efficient.
Modern database systems are designed to take normalized tables and make
accessing them very fast. That's the fundamental operation a relational
database management system performs.
How big is your total collection of active databases right now? If it fits
in (say) a gigabyte, chances are you can't possibly make it too slow.
If it *is* too slow when done "right", *then* you get out the query analyzer
and look to see what it's spending its time doing, and you add the one index
that cuts that time from minutes to miliseconds.
> Thanks for the acronym - more info to go on :-)
>
> I don't know if we will try or need to set up transactions, but they
> could be helpful if something is not a simple insert or update.
You will need transactions. Trust me on this one. :-) If nothing else,
you'll need transactions when you're archiving out an old job.
That said, it's surprisingly common that if you structure your queries
right, you can make every operation essentially an "append to this table",
followed by an eventual "update that table in a way that if it fails, we can
reconstruct that table entirely from scratch". Think Journal -> General
Ledger.
>> The key is an identifier that won't change. What text do you have for
>> the entity that won't change? The company name? The road the pole is on?
>> The name of the person you should talk to about it? All these things can
>> change.
>
> point made - thanks :-)
That siad, btw, it's often the case that you *can* find text that won't
change often enough to make a difference. Stock ticker, street name, radio
station call letters. If it changes rarely enough that you're willing to
treat it as an "delete all the old, insert all the new" sort of transaction,
using text is OK. One generally doesn't do this with people, but saying "if
you change your stock ticker, you have to treat it as selling all the old
shares and getting the same number of new shares from the new ticker name"
isn't too unreasonable.
> OK, you want a job surveying utility poles :-) You have good insight.
I've just done this dozens of times in the last couple decades.
> At this point I am not used to setting up primary keys with multiple
> columns - but this is making some sense. The question we have is if we
> have time to implement some such beast :-)
Look at it this way: This is the data that is vital to your company, in the
strictest sense. Your company's job is to manage and evolve this data. You
don't have time to get it wrong.
What about your current system is so broken that you can't afford the time
to fix it?
The other possibility is to hire someone who actually does this sort of
stuff for a living for long enough to show you how to set up the basics of
the database.
>>> Exporting the data for archival purposes could be a good idea. Even
>>> exporting a fully functional database with only the particular job
>>> data required could be good.
>>
>> All those things are not-too-hard if all the data is in the database.
>>
>
> Yes, and if the database schema is identical, the same programs can use
> any database - from mySQL to .mdb files.
That's the goal. As you get more sophisticated, you find that breaks down.
When you use views in your "real" database and you archive it to something
that has no views, you find yourself with a problem.
That said, even a cheap machine nowadays has more than enough processing
power to do whole boatloads of database processing.
About how much data do you have, in total and in active jobs? Many people
seriously underestimate just how much processing even a $1000 machine can do
on a SQL database.
> Well, almost free - just takes time to set up. By the time all is said
> and done I figure that mySQL will take about $1k - $2k in my time to set
> up reliability. Need to get it installed on our server and set up
> connection methods and backup methods. Not too difficult, but I don't
> do it every day.
Be sure to factor in some ongoing maintenance time. Occasionally your
replication will break, or something like that, and you'll have to fix stuff
in a panic. :-)
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 08/03/2011 09:24 PM, Darren New wrote:
> Orchid XP v8 wrote:
>> Pro-tip: Don't use real-world things as primary keys. Because you know
>> what? Real-world things have this nasty habit of *changing*. You don't
>> ever want your primary key to change. This is why databases up and
>> down the country refer to Mr Smith not as Mr Smith but as customer
>> #238523.
>
> I disagree. The only time you should use automatically-generated primary
> keys is when there is no attribute of the item in the table that is
> guaranteed not to change.
The recommendation comes not so much from me, but from Tom Kyte. (He's
an Oracle expert. I'm not entirely sure, but I believe he actually works
for Oracle Corp.) The assertion being that "stuff that doesn't change"
is rarer than you think it is.
But sure, especially if you're recording historical data (which is of
course especially unlikely to change), there's no real need for a
synthetic key.
> A sales receipt whose primary key is the cash register identifier and
> the time to the second when the receipt was printed is an excellent PK
> for a sales receipt.
Agreed.
> A road
> intersection has a fine PK of the GPS coordinates of the intersection.
Until the GPS coordinates turn out to have been inaccurate. GPS is only
accurate to a few hundred yards of course, so different people will
probably get different measurements depending on where they're standing.
That's probably not a great idea. (Unless you explicitly round the
coordinates up to some number that makes the size of the actual
intersection negligable.)
Then again, most intersections probably already *have* a synthetic key.
I don't know about your country, but around here major roads tend to
have uniquely numbered markers on the roadside...
> The problem is that people are changing all the time, so it's very hard
> to have a customer record that persists with a PK that makes sense.
Quite.
> The drawback of this is if you get a reference to the wrong value,
> you'll never notice.
Well, perhaps. But if somebody gives you the wrong phone number, you'll
"never notice" in the same sense...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> Interesting way of looking at building a schema - just break it down to
> objects. I think this fits our data very well. I think we were doing
> this anyway, but not on purpose - more by accident.
Trust me on this: It's the only way you ever get a schema you don't have to
change with every software update. Record what is "out there", and to the
extent sensible, never overwrite data. If you find yourself thinking "We'll
need to keep copies", you've done something wrong, because what that really
means is you need the state of something at two different times, but you
didn't record the times. Record *everything*, in as raw a format as you can.
I am looking at a financial system right now where I never overwrite data.
When someone swipes a card, I put into a table exactly what the card machine
gave me: The card machine ID, the credit card number, the amount the clerk
typed in, and the timestamp I got the record. I don't record the merchant
ID in that table, or the ID of the cardholder. That all comes later, when I
process the records.
Why? Because the merchant might change merchant accounts before telling us.
The cardholder might lose the card. Heck, we might even accidentally create
two cards with the same number.
I'm blathering. Nevermind.
Anyway, yes, record everything from *outside* your company in one set of
tables, as *raw* as you can. Changes to these tables should be idempotent.
(I.e., you *should* be able to read the CSV file the telco gave you listing
the pole information multiple times into the same table without corruption.
This implies no automatically-generated primary key on those tables. That
said, again, this is just a guideline and not always possible.)
Have a second set of tables, completely created from the first set, that
holds any calculated information that might be difficult to recalculate
(e.g., running account balances in a financial app), but which can be
checked against and recreated from the raw data.
Once you have stuff working, figure out where it's slow, *then* fix that.
Usually "fix that" involves changing the declarations of some indexes, not
rearranging the actual data (which, if you have data for which you can
follow the rules above, you can't really rearrange anyway).
> this might be worth considering, but I don't know how easy it will be to
> implement and keep straight. Seems like the a user created key like
> this could lead to duplicates - but if it is a key it will prevent
> duplicates anyway - just prompt for another key.
Oh, yes, this isn't for user keys. I meant where in a programming language
you'd use an enumerated type.
So if you have a "status" row for jobs that says "in negotiation", "paid for
but waiting for someone to be assigned", "that person is working on it", and
"ready to be delivered", you could use this sort of coding to make your
programs more readable. "NEGO", "PAID", "WORK", "DONE" rather than 0, 1, 2, 3.
> Most of the data in question is a simple map from an integer key to a
> piece of text. I do have some queries on the text with joins to pull
> data from other tables.
>
> The text is not designed to be changed.
>
> The alternative that we have is to flatten the relationship and put the
> text directly in the table instead of having a key reference.
>
> ie
> go from
>
> tablePole tableOwners
> Pole OwnerID ID Name
>
>
> to
>
> tablePole
> Pole OwnerName
>
>
>
> Any thoughts on this?
If OwnerName shows up in multiple places, then these bits would need to be
in a PK/FK relationship to properly cascade changes.
I assume each owner has more than one pole. Hence, duplicating the name for
every pole is inefficient space-wise and dangerous update-wise. (Especially
if you don't use transactions.)
Is the "owner" a real independent entity? Is it rational to speculate there
might be an owner with no poles? Or an owner who has poles but hasn't told
you about them yet? What about an owner who just sold his last pole to pay
for the pole he's going to buy next week?
So, yes, here, "owner" is an actual entity, not just some field about a
pole. The name of the owner isn't associated with the pole at all.
Indeed, the *best* way to structure this would be
table Pole: GPS coords, height. (PK: GPS)
table Owner: ID, contact name, contact phone, etc etc. (PK: ID)
table Ownership: PoleGPS, OwnerID, StartingFromTime. (PK: GPS+StartingFromTime).
table Cables: PoleGPS, cable height, fiber height, power height,
StartingFromTime. (PK: GPS + StartingFromTime)
Later, you can go back and ask questions like "Did AT&T ever change the
height of the power lines on the pole at Fifth and Main, or was that
Cingular, after they bought that line from AT&T?"
*That* might be a bit too normalized for your purposes, but you'll never get
in a situation where someone sells a bunch of poles, then comes to you and
asks you to recreate a report from before they sold the poles or something.
See if you can look at the columns you have and come up with a breakdown
like this into "things really happening out there". You might find yourself
a natural at it. :-)
Now, the "groups" kind of thing I was talking about, the "secondary" set of
tables, might be tables that take a group of poles owned by the same
company, that constitute a single "cable run". Large stretches of cable run
might have the cable at the same height, so your calculations could be
faster if you had a table that put each pole in a group, and then had the
height of the cable on the group specified, or something like that.
It's hard coming up with that sort of thing when I don't know how your
business works, tho, but that's the approximate sort of thing I was talking
about.
>>> We are developing a new database system to take care of some severe
>>> shortcomings of the old system.
>>
>> What are the shortcomings?
>>
>
> Let's just say is was a single table database with about 250 columns -
> limited only by the fact that Excel at one time only allowed about 250
> columns.
Oh, so the shortcoming is that it was rushed and grew instead of being
designed. Makes sense. :-)
> Putting data in and getting data out was very difficult and required the
> sacrifice of too many brain cells.
That's exactly where SQL queries between multiple tables will help.
Thinking in SQL instead of in terms of iterating over things is tough, but
it's almost always possible to come up with a statement in SQL that will
pull out all and only the data you need. If you ever get to the point where
you're saying "for each result from *this* query, submit *that* query",
you're almost 100% sure you're doing something wrong. In my entire career, I
only ever did that once, and that was after I'd written it the right way and
figured out MySql was not optimizing things in the obvious way and I didn't
know how to make it do so.
>>> Our data has a lot of common 'choose from' items - such as the owner
>>> of the pole.
>>> The data consists of pole specific information that doesn't change
>>> (owner, height, etc) and information that will likely change (the
>>> height of cables on the pole).
>>
>> These last two should probably be in separate tables that you join at
>> runtime. (Especially if you're using MySql, where differnet "engines"
>> give you different capabilities.)
>>
>
> That's what we are trying to figure out - how to make it make sense to
> have them in separate tables.
Let me know if the description of tables above does or doesn't make sense.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
> The assertion being that "stuff that doesn't change"
> is rarer than you think it is.
I'll grant that it's rarer than most people think it is. The trick is to
consider whether if the PK changes, you're willing to treat the object as a
new entity, or whether it's rare enough that you're willing to expunge the
old ID entirely.
Is it the same telephone pole if I dig it up and plant it somewhere else?
Sure, maybe. But am I ever going to need to know that if my job involves
routing cables? Probably not. Am I going to need to know that if my job
involves replacing them when they wear out? Most definitely.
How often do you replace the little tags on the side of the pole that give
the pole number? Not very often. Is there any reason if that ID tag gets
replaced that you can't just update in the database everyplace that appears?
Probably not.
> But sure, especially if you're recording historical data (which is of
> course especially unlikely to change), there's no real need for a
> synthetic key.
I contend that most primary data in the database should be historical.
> Until the GPS coordinates turn out to have been inaccurate.
Then you fix it. There never *was* an intersection with those GPS
coordinates. So you fix it. The point is to avoid collisions, not to
necessarily have a PK that never changes.
Given that the BIGGEST TRANSACTIONAL DATABASE IN THE WORLD (*) identifies
street intersections with GPS coordinates, I'm gonna have to go with GPS
coordinates on this one. ;-)
> GPS is only
> accurate to a few hundred yards of course,
Uh, no. What are you smoking?
> so different people will
> probably get different measurements depending on where they're standing.
Sure. But you're not asking someone to identify a real intersection based on
their GPS coordinates. You're using it as a primary key.
None of your customers identifies themself with your internal ID number, either.
> Then again, most intersections probably already *have* a synthetic key.
> I don't know about your country, but around here major roads tend to
> have uniquely numbered markers on the roadside...
Assuming those don't change, they're probably based on geometry also. The
point of using GPS coordinates is that they are guaranteed not to change. If
you enter them in the database wrong, then you have bad data, but that's not
because the intersection moved.
>> The drawback of this is if you get a reference to the wrong value,
>> you'll never notice.
>
> Well, perhaps. But if somebody gives you the wrong phone number, you'll
> "never notice" in the same sense...
Except I'm not using a phone number as a primary key. :-)
(*) The 300 terabyte TURKS database. The size is probably bigger 15 years
later, too.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 3/8/2011 4:43 PM, Darren New wrote:
> Tom Austin wrote:
>> Having unoptimized programs makes it very slow. Programs written as
>> fast as possible do not benefit from being efficient.
>
> Modern database systems are designed to take normalized tables and make
> accessing them very fast. That's the fundamental operation a relational
> database management system performs.
>
> How big is your total collection of active databases right now? If it
> fits in (say) a gigabyte, chances are you can't possibly make it too slow.
>
> If it *is* too slow when done "right", *then* you get out the query
> analyzer and look to see what it's spending its time doing, and you add
> the one index that cuts that time from minutes to miliseconds.
>
I think most of our slow downs is currently in the applications that
access it. Basically inefficient routines and use of objects. A
required approach when coding time as of the essence.
>> Thanks for the acronym - more info to go on :-)
>>
>> I don't know if we will try or need to set up transactions, but they
>> could be helpful if something is not a simple insert or update.
>
> You will need transactions. Trust me on this one. :-) If nothing else,
> you'll need transactions when you're archiving out an old job.
>
I've worked with transactions some in the past and they make a lot of
sense. I'll have to try to incorporate them as we move forward.
Does it make sense to set up a transaction wrapper on even simple single
updates?
> That said, it's surprisingly common that if you structure your queries
> right, you can make every operation essentially an "append to this
> table", followed by an eventual "update that table in a way that if it
> fails, we can reconstruct that table entirely from scratch". Think
> Journal -> General Ledger.
>
OK, I've been dealing with databases for 15 years now - on mostly a low
level. that just went right over my head. Makes sense, but I have no
concept of how it is implemented.
>
> That siad, btw, it's often the case that you *can* find text that won't
> change often enough to make a difference. Stock ticker, street name,
> radio station call letters. If it changes rarely enough that you're
> willing to treat it as an "delete all the old, insert all the new" sort
> of transaction, using text is OK. One generally doesn't do this with
> people, but saying "if you change your stock ticker, you have to treat
> it as selling all the old shares and getting the same number of new
> shares from the new ticker name" isn't too unreasonable.
>
for some reason I am hesitant to present the key as something that the
end user interacts with. Maybe I'm a control freak :-&
>> OK, you want a job surveying utility poles :-) You have good insight.
>
> I've just done this dozens of times in the last couple decades.
>
what - survey telephone poles?
>
> Look at it this way: This is the data that is vital to your company, in
> the strictest sense. Your company's job is to manage and evolve this
> data. You don't have time to get it wrong.
that is very correct. Our first stab was fast and mostly dirty. We
spent a lot of time to try to get the data right, but we missed some
pieces of data.
>
> What about your current system is so broken that you can't afford the
> time to fix it?
>
It has layers upon layers of fixes. The end users had to fight with it
to get it to put out data correctly. it was an ad-hoc rich system to
the point where I refused to make any more changes because of the risk
of data corruption/loss and our client getting crap for a deliverable.
There were some cool features that I implemented - like a GIS type
interface - before GIS systems were popular.
> The other possibility is to hire someone who actually does this sort of
> stuff for a living for long enough to show you how to set up the basics
> of the database.
>
It's a balance. Depending on our requirements it could take just as
much of our time working with the guy and doing it ourselves.
>>>> Exporting the data for archival purposes could be a good idea. Even
>>>> exporting a fully functional database with only the particular job
>>>> data required could be good.
>>>
>>> All those things are not-too-hard if all the data is in the database.
>>>
>>
>> Yes, and if the database schema is identical, the same programs can
>> use any database - from mySQL to .mdb files.
>
> That's the goal. As you get more sophisticated, you find that breaks
> down. When you use views in your "real" database and you archive it to
> something that has no views, you find yourself with a problem.
>
yes, one just has to pay attention and not set up things that break.
It could be that applications do not rely on the views (or other
non-common features) But then one could be crippling themselves.
> That said, even a cheap machine nowadays has more than enough processing
> power to do whole boatloads of database processing.
>
> About how much data do you have, in total and in active jobs? Many
> people seriously underestimate just how much processing even a $1000
> machine can do on a SQL database.
>
yes, I don't think it will take much of a machine. We have an old
server box (PIII 600 I think) that would likely do just that. Or I
might incorporate MySQL on our current file server (Linux) to simplify
things and use the old box for development.
>> Well, almost free - just takes time to set up. By the time all is said
>> and done I figure that mySQL will take about $1k - $2k in my time to
>> set up reliability. Need to get it installed on our server and set up
>> connection methods and backup methods. Not too difficult, but I don't
>> do it every day.
>
> Be sure to factor in some ongoing maintenance time. Occasionally your
> replication will break, or something like that, and you'll have to fix
> stuff in a panic. :-)
>
yes, just like last week with out servers.
In the past 3 years I have had 1 or 2 incidents that required immediate
attention.
The past week brought no less than 5 of them.
From full OS drives to failed RAID mirrors.
Luckily none of them were damaging. Just annoying and cost a couple of
hours of everyone's time.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> I think most of our slow downs is currently in the applications that
> access it. Basically inefficient routines and use of objects. A
> required approach when coding time as of the essence.
Yeah, but now you hve some luxury to do it right, I'd wager, since you
already have a working system.
> Does it make sense to set up a transaction wrapper on even simple single
> updates?
If you're modifying more than one row at a time, yes. In most database
engines, each statement you submit becomes its own transaction. This isn't
true on all types of MySql tables. For example, if you have a MyISAM table
and you say "update blah set x = 50 where y = 3" and it gets interrupted
half way, you might update only half the rows where y = 3.
A single append of a single row is atomic. You're not going to corrupt the
database per se. Just the consistency of the data inside the database.
So, yes, there's really so little overhead in using a transaction around
anything that affects more than one row that it makes no sense not to do so.
>> That said, it's surprisingly common that if you structure your queries
>> right, you can make every operation essentially an "append to this
>> table", followed by an eventual "update that table in a way that if it
>> fails, we can reconstruct that table entirely from scratch". Think
>> Journal -> General Ledger.
>>
>
> OK, I've been dealing with databases for 15 years now - on mostly a low
> level. that just went right over my head. Makes sense, but I have no
> concept of how it is implemented.
I've usually done financial databases. So, basically, all the history of
everything in and out of the database is recorded in one set of tables,
while the current state is in a second set of tables.
Like your checkbook: Every check you write gets a row appended, and you
never change that row except to set a flag that you never unset (the "this
check cleared" flag). But there's another table that lists each account and
says how much money is in it. In theory, you can construct the latter from
the former, if you keep all your old checkbook registers around.
> for some reason I am hesitant to present the key as something that the
> end user interacts with. Maybe I'm a control freak :-&
No, there's almost never a reason to present the key that way. That doesn't
make it less useful as a key. :-)
>>> OK, you want a job surveying utility poles :-) You have good insight.
>> I've just done this dozens of times in the last couple decades.
> what - survey telephone poles?
No, I only did that once. :-) Design databases from scratch.
> that is very correct. Our first stab was fast and mostly dirty. We
> spent a lot of time to try to get the data right, but we missed some
> pieces of data.
That's not unusual. The trick is to have the database structured in a way
you can add the missed data later without *changing* what's already there.
The best way to do that is to primarily record into the database exactly
what's outside the database, reflecting reality as closely as possible.
>> What about your current system is so broken that you can't afford the
>> time to fix it?
>>
>
> It has layers upon layers of fixes. The end users had to fight with it
> to get it to put out data correctly. it was an ad-hoc rich system to
> the point where I refused to make any more changes because of the risk
> of data corruption/loss and our client getting crap for a deliverable.
Been there done that. Along with "before I start, has this ever been backed
up?" Answered by "No, it has never been reliable enough we could get a
backup of the data." Followed by "Thanks, but I like my current job enough."
>> The other possibility is to hire someone who actually does this sort of
>> stuff for a living for long enough to show you how to set up the basics
>> of the database.
>>
>
> It's a balance. Depending on our requirements it could take just as
> much of our time working with the guy and doing it ourselves.
This is true. I don't think it would take any less time, but you'd probably
have a more robust system when you were done.
> yes, one just has to pay attention and not set up things that break.
> It could be that applications do not rely on the views (or other
> non-common features) But then one could be crippling themselves.
Yes. It's better to just say "if we need multiple archives of this database,
we'll use the same engine."
> yes, I don't think it will take much of a machine. We have an old
> server box (PIII 600 I think) that would likely do just that. Or I
> might incorporate MySQL on our current file server (Linux) to simplify
> things and use the old box for development.
Yeah. And you'll likely want a replicated server as well.
> Luckily none of them were damaging. Just annoying and cost a couple of
> hours of everyone's time.
Yeah, that's much better than the place catching on fire. :-)
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 08/03/2011 10:34 PM, Darren New wrote:
> Orchid XP v8 wrote:
>> The assertion being that "stuff that doesn't change" is rarer than you
>> think it is.
>
> I'll grant that it's rarer than most people think it is. The trick is to
> consider whether if the PK changes, you're willing to treat the object
> as a new entity, or whether it's rare enough that you're willing to
> expunge the old ID entirely.
Seems reasonable. And I'm guessing that Mr Kyte was talking about huge
multi-TB data warehouses where updating or removing all references to
something would be a major PITA.
> I contend that most primary data in the database should be historical.
This also seems like a sensible way to go.
>> Until the GPS coordinates turn out to have been inaccurate.
>
> Then you fix it. There never *was* an intersection with those GPS
> coordinates. So you fix it. The point is to avoid collisions, not to
> necessarily have a PK that never changes.
The point is *totally* to have a PK that never changes, since changing a
PK is a tricky business which you want to avoid. (Duplicate PKs would of
course be much, much worse, but that's fairly easy to avoid.)
> Given that the BIGGEST TRANSACTIONAL DATABASE IN THE WORLD (*)
> identifies street intersections with GPS coordinates, I'm gonna have to
> go with GPS coordinates on this one. ;-)
OK, well I'm gonna have to say they must be doing something more precise
than "stand in the general vicinity of the intersection, take down the
GPS coordinates, and we'll just use those".
>> GPS is only accurate to a few hundred yards of course,
>
> Uh, no. What are you smoking?
Hey, the satellites are 20,000 km up in the sky. It's astonishing that
GPS can even figure out which city your in, never mind fixing your
location to a few hundred yards.
>> so different people will probably get different measurements depending
>> on where they're standing.
>
> Sure. But you're not asking someone to identify a real intersection
> based on their GPS coordinates. You're using it as a primary key.
So, what, you pick a GPS coordinate at random that's approximately in
the right area? (So long as its unambiguously not near any *other*
intersection.)
>> Then again, most intersections probably already *have* a synthetic
>> key. I don't know about your country, but around here major roads tend
>> to have uniquely numbered markers on the roadside...
>
> Assuming those don't change, they're probably based on geometry also.
They seem to just be numbered sequentially along the length of the road.
> The point of using GPS coordinates is that they are guaranteed not to
> change.
Sure. Unless the GPS coordinate system moves, which can't possibly
happen. Oh, wait...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |