POV-Ray: Newsgroups: povray.off-topic: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions		Server Time 31 Oct 2025 12:40:11 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 12:38:57
Message: <4d77bb31$1@news.povray.org>

On 3/8/2011 6:22 PM, Darren New wrote:
> Tom Austin wrote:
>> I think most of our slow downs is currently in the applications that
>> access it. Basically inefficient routines and use of objects. A
>> required approach when coding time as of the essence.
>
> Yeah, but now you hve some luxury to do it right, I'd wager, since you
> already have a working system.
>

Exactly - but just how much time we can allocate to it is yet to be 
seen.  But I think I can push for more time now because it will cost 
much more time later.  But this only works to a point.

>> Does it make sense to set up a transaction wrapper on even simple
>> single updates?
>
> If you're modifying more than one row at a time, yes. In most database
> engines, each statement you submit becomes its own transaction. This
> isn't true on all types of MySql tables. For example, if you have a
> MyISAM table and you say "update blah set x = 50 where y = 3" and it
> gets interrupted half way, you might update only half the rows where y = 3.
>
> A single append of a single row is atomic. You're not going to corrupt
> the database per se. Just the consistency of the data inside the database.
>
> So, yes, there's really so little overhead in using a transaction around
> anything that affects more than one row that it makes no sense not to do
> so.
>

Makes sense - on anything that affects more than 1 row in 1 table is OK. 
  Transact wrapper everything else :-)

>>> That said, it's surprisingly common that if you structure your queries
>>> right, you can make every operation essentially an "append to this
>>> table", followed by an eventual "update that table in a way that if it
>>> fails, we can reconstruct that table entirely from scratch". Think
>>> Journal -> General Ledger.
>>>
>>
>> OK, I've been dealing with databases for 15 years now - on mostly a
>> low level. that just went right over my head. Makes sense, but I have
>> no concept of how it is implemented.
>
> I've usually done financial databases. So, basically, all the history of
> everything in and out of the database is recorded in one set of tables,
> while the current state is in a second set of tables.
>
> Like your checkbook: Every check you write gets a row appended, and you
> never change that row except to set a flag that you never unset (the
> "this check cleared" flag). But there's another table that lists each
> account and says how much money is in it. In theory, you can construct
> the latter from the former, if you keep all your old checkbook registers
> around.
>

Ok, makes sense - if you keep a 'transaction' history.  In the financial 
world - highly recommended.

I don't think we will have to go that far.

Our bigger problem is getting the data that is entered to be correct. 
Not just structurally correct, but actually correct.  This has to do 
with the interfaces that were assembled with too much haste.

>> for some reason I am hesitant to present the key as something that the
>> end user interacts with. Maybe I'm a control freak :-&
>
> No, there's almost never a reason to present the key that way. That
> doesn't make it less useful as a key. :-)
>

True.  I believe that if the system is built right the user will NEVER 
have to look at the raw data.  The interfaces should give the user all 
of the information that they need.  And if they need to much with raw 
data for some kind of reporting, the system can output some form of csv 
that the user can manipulate at will.

The keys can be almost anything and it not change the functionality of 
the database or how the user interacts with.  it does make it somewhat 
more readable for the person trying to figure out a problems that 
somehow got into the data.

>>>> OK, you want a job surveying utility poles :-) You have good insight.
>>> I've just done this dozens of times in the last couple decades.
>> what - survey telephone poles?
>
> No, I only did that once. :-) Design databases from scratch.
>
>> that is very correct. Our first stab was fast and mostly dirty. We
>> spent a lot of time to try to get the data right, but we missed some
>> pieces of data.
>
> That's not unusual. The trick is to have the database structured in a
> way you can add the missed data later without *changing* what's already
> there. The best way to do that is to primarily record into the database
> exactly what's outside the database, reflecting reality as closely as
> possible.
>

Yes, agreed.  Depending on how quickly the thing was built, it can be 
very difficult if a complete aspect was missed.  Also, one has to define 
*what* and *how* base rules before adding the data or you get a mess - 
which we are approaching.

>
>>> What about your current system is so broken that you can't afford the
>>> time to fix it?
>>>
>>
>> It has layers upon layers of fixes. The end users had to fight with it
>> to get it to put out data correctly. it was an ad-hoc rich system to
>> the point where I refused to make any more changes because of the risk
>> of data corruption/loss and our client getting crap for a deliverable.
>
> Been there done that. Along with "before I start, has this ever been
> backed up?" Answered by "No, it has never been reliable enough we could
> get a backup of the data." Followed by "Thanks, but I like my current
> job enough."
>

We are not quite that far :-)  But it is nice to be starting with trying 
to make a working system instead of forcing a broken one to continue to 
work.


>>> The other possibility is to hire someone who actually does this sort of
>>> stuff for a living for long enough to show you how to set up the basics
>>> of the database.
>>>
>>
>> It's a balance. Depending on our requirements it could take just as
>> much of our time working with the guy and doing it ourselves.
>
> This is true. I don't think it would take any less time, but you'd
> probably have a more robust system when you were done.
>

very true - worth considering

>> yes, one just has to pay attention and not set up things that break.
>> It could be that applications do not rely on the views (or other
>> non-common features) But then one could be crippling themselves.
>
> Yes. It's better to just say "if we need multiple archives of this
> database, we'll use the same engine."
>

I just have to get this through to the rest of the guys.

>> yes, I don't think it will take much of a machine. We have an old
>> server box (PIII 600 I think) that would likely do just that. Or I
>> might incorporate MySQL on our current file server (Linux) to simplify
>> things and use the old box for development.
>
> Yeah. And you'll likely want a replicated server as well.
>

Not a bad idea - but that can get a bit more complicated.  But I will 
take it as a strong suggestion.

>> Luckily none of them were damaging. Just annoying and cost a couple of
>> hours of everyone's time.
>
> Yeah, that's much better than the place catching on fire. :-)
>

We do have off-site backups :-)   Just 500GB in one big tarball tho :-(

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 12:42:07
Message: <4d77bbef$1@news.povray.org>

On 3/8/2011 4:31 PM, Darren New wrote:
> The absolute best way to make this work out, then, is to record
> "reality" in the database. Ignore how you're going to use the data while
> you're designing what's in the database (to the extent possible).
> Instead, simply record facts.
>
> What I'm trying to say is don't say "our calculations would be easier if
> our data was arranged like *this*." If you stick with the reality of
> what's out there, you will never run into the situation where you have
> to *change* data you already have in the database.
>
> I know this sounds kind of vague, and it can be hard to get away from it
> if you already have a model of something up and running and you know how
> that works.
>

I am realizing this - I believe that our current model is not yet too 
bad in this regard.  It basically reflects reality.  We just need to 
lock it down a bit more.

Your tips here will help us move forward more quickly.  We do get mired 
down in how to make the data usable.

I do need to consider usability somewhat to make sure that the data that 
is needed makes it into the system.

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 13:30:50
Message: <4d77c75a@news.povray.org>

Tom Austin wrote:
> I will stay out of this one....  I think I will stick with auto number 
> fields for now - just to keep my head on straight.

There's nothing *wrong* with it. It's just that many libraries assume that 
every single table has an auto-increment ID, which I think is the wrong way 
to go, simplifying the library at the expense of accurate data.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 13:32:19
Message: <4d77c7b3$1@news.povray.org>

Tom Austin wrote:
> I'm on this page fully.  I prefer to stay away from MS database 
> solutions.  Not that they are not good, but they are just too ... well 
> just 'too'.

I bring it up only because you're running Windows there, based on your 
comments about using Access. I don't know how well MySql and Postgressql 
have been ported to Windows.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 13:43:31
Message: <4d77ca53$1@news.povray.org>

Tom Austin wrote:
> Exactly - but just how much time we can allocate to it is yet to be 
> seen.  But I think I can push for more time now because it will cost 
> much more time later.  But this only works to a point.

Naturally. I'm just recommending that you make it clear that any benefit you 
spend time on now will reap rewards over and over into the future.

> Makes sense - on anything that affects more than 1 row in 1 table is OK. 
>  Transact wrapper everything else :-)

Well, other way around, yes. Anything that affects more than one row goes 
into a transaction.

> Our bigger problem is getting the data that is entered to be correct. 
> Not just structurally correct, but actually correct.  This has to do 
> with the interfaces that were assembled with too much haste.

That's always a bugger, yes. :-)

> True.  I believe that if the system is built right the user will NEVER 
> have to look at the raw data.  The interfaces should give the user all 
> of the information that they need.

Yep. And if you really get into it, you wind up making views for each 
function of your application, so the applications themselves don't need to 
know how the raw data is organized. But that's kind of overkill for a small 
database.

> The keys can be almost anything and it not change the functionality of 
> the database or how the user interacts with.  it does make it somewhat 
> more readable for the person trying to figure out a problems that 
> somehow got into the data.

Right. For one thing, for example, it keeps you from having (say) the same 
pole entered multiple times in the database by mistake. If a pole is 
identified by the GPS coords (assuming these are precise enough) it keeps 
the database from having three records for a pole all at the same location.

> Yes, agreed.  Depending on how quickly the thing was built, it can be 
> very difficult if a complete aspect was missed. 

No, it's not really that hard to add a complete new aspect to the database 
after the fact, if what exists there is well thought out. You just have to 
avoid putting multiple things in the same table that aren't really the same 
thing in reality; like, avoid putting owner contact info into a table of 
poles, or even ownership information into a table of poles.

> Also, one has to define 
> *what* and *how* base rules before adding the data or you get a mess - 
> which we are approaching.

Yep. That's the "C" in ACID. Way too many people who never actually studied 
the topic think that the "C" means an error updating a row will corrupt the 
database files. No, it means an error updating a row will corrupt the 
*data*, not the container.

>>> yes, one just has to pay attention and not set up things that break.
>>> It could be that applications do not rely on the views (or other
>>> non-common features) But then one could be crippling themselves.
>>
>> Yes. It's better to just say "if we need multiple archives of this
>> database, we'll use the same engine."
>>
> 
> I just have to get this through to the rest of the guys.

Or, alternately, accept that you'll use the same engine, and then make plans 
for how to replace its features. Like, "we'll use views, but if we wind up 
exporting to CVS, we'll export both the raw data and the views, which is OK 
since we won't be updating historical data that we've already exported."

> Not a bad idea - but that can get a bit more complicated.  But I will 
> take it as a strong suggestion.

MySql makes it pretty easy, actually. Just follow the instructions.

Basically, you set up two machines with identical configurations of the SQL 
server. Then you tell one "Hey, you're a master", using the tool that takes 
a snapshot of the database and turns on the master flag at the same time. 
Then you restore that snapshot to the slave, then tell the slave "Hey, your 
master is over there!"

As long as you don't update the slave in a way that makes a transaction 
applied to the master fail on the slave, you're good to go. If you 
accidentally insert a record into the slave in a table that has an 
auto-increment integer, the slave will crash out when the master tries to 
insert the record with the same key, and you'll have to re-do that 
backup/restore process.

> We do have off-site backups :-)   Just 500GB in one big tarball tho :-(

Yah. Your actual data is 500G?  That's actually a pretty large database.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:10:01
Message: <4d77d089$1@news.povray.org>

On 3/8/2011 5:11 PM, Darren New wrote:
> Trust me on this: It's the only way you ever get a schema you don't have
> to change with every software update. Record what is "out there", and to
>
> I am looking at a financial system right now where I never overwrite
> data. When someone swipes a card, I put into a table exactly what the
>
> Why? Because the merchant might change merchant accounts before telling
> us. The cardholder might lose the card. Heck, we might even accidentally
> create two cards with the same number.
>
> I'm blathering. Nevermind.
>

you can blather - it is useful info

> Anyway, yes, record everything from *outside* your company in one set of
> tables, as *raw* as you can. Changes to these tables should be
> idempotent. (I.e., you *should* be able to read the CSV file the telco
> gave you listing the pole information multiple times into the same table
> without corruption. This implies no automatically-generated primary key
> on those tables. That said, again, this is just a guideline and not
> always possible.)
>

I see - the primary key thing should prevent the 'error'.
When you state multiple times, you are implying that multiple records 
are NOT being created - just 'updating' existing records after the first 
read.

>
>> this might be worth considering, but I don't know how easy it will be
>> to implement and keep straight. Seems like the a user created key like
>> this could lead to duplicates - but if it is a key it will prevent
>> duplicates anyway - just prompt for another key.
>
> Oh, yes, this isn't for user keys. I meant where in a programming
> language you'd use an enumerated type.
>
> So if you have a "status" row for jobs that says "in negotiation", "paid
> for but waiting for someone to be assigned", "that person is working on
> it", and "ready to be delivered", you could use this sort of coding to
> make your programs more readable. "NEGO", "PAID", "WORK", "DONE" rather
> than 0, 1, 2, 3.
>

yes, enumerations are very friendly - I avoid using plain ol integers 
for this stuff - makes it hard to read

>
> It's hard coming up with that sort of thing when I don't know how your
> business works, tho, but that's the approximate sort of thing I was
> talking about.
>

Yes, there is more to it, but you do a good job of visualizing it.

>
>
> Oh, so the shortcoming is that it was rushed and grew instead of being
> designed. Makes sense. :-)
>

pretty much - the owner didn't want to invest the time to make it better 
and gain the rewards.  new owner - new ideas

>> Putting data in and getting data out was very difficult and required
>> the sacrifice of too many brain cells.
>
> That's exactly where SQL queries between multiple tables will help.
> Thinking in SQL instead of in terms of iterating over things is tough,
> but it's almost always possible to come up with a statement in SQL that
> will pull out all and only the data you need. If you ever get to the
> point where you're saying "for each result from *this* query, submit
> *that* query", you're almost 100% sure you're doing something wrong. In
> my entire career, I only ever did that once, and that was after I'd
> written it the right way and figured out MySql was not optimizing things
> in the obvious way and I didn't know how to make it do so.
>

I think I have run into this already
select A from b where A.ID in ( select ID from C)

we got pretty complicated with poles, connections between poles, cables 
on poles and cables between poles....  a little intertwined

in its simplest form leaving out fluff like owners and such:

table poles:  (each pole)
ID, GPS Location
1 entry for each pole

table connections:  (how poles are connected)
ID, PoleID1, PoleID2, Distance
1 entry for each pole-pole joining (1 for each 'set' of cables)

table attachments:  (cable attached to poles & height)
ID, PoleID, Height
multiple entries per pole
1 for each attachment

table midspans: (lowest height between poles for each cable)
ID, AttachmentID1, AttachmentID2, Height, ConnectionID(redundant)
multiple midspans for each 'connection'
1 midspan per cable over the connection
essentially 1 midspan for each connection on a pair of poles

The rest is pretty much straight forward - objects and properties 
associated with each of the above.

As you can see, there are some circular references going on (attachments 
and midspans connect poles

>
> Let me know if the description of tables above does or doesn't make sense.
>

Makes perfect sense.

To clarify my situation - I tried to present some questions that we have 
in our office without bias to get a true feel of what constitutes good 
and bad practice.

What you have presented falls in line with what I was thinking - but I 
didn't want to present it as "is it stupid to flatten" so that I would 
not bias the question.

You have presented some new ideas that 'extend' my thinking and stretch 
it.  I do not know how much will make it in to the system, but it is 
good food for thought.

We have some views in our group that state that the use of the data 
should drive how it is stored.  I agree with it to a point - mainly for 
ease of working with the data, but not at the sacrifice of usability of 
the data.  The drive is so hard that there is talk about flattening the 
database to make it 'simpler' and that adding more tables makes it more 
complex.

Additionally some of our issue is that when we finish a job for a client 
we do not need to reference the data any more.  Rarely have we found 
ourselves going back to old data.  How much effort should be put into a 
system that supports saving everything and making it accessible when one 
does not refer back to it?

Thanks for your feedback - it is very helpful.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:14:08
Message: <4d77d180$1@news.povray.org>

On 3/9/2011 12:11 PM, Darren New wrote:
> Le_Forgeron wrote:
>> Civil usage of GPS is accurate to 30 yards/meters.
>
> I think they turned off the encryption of the low bits back in Clinton's
> timeframe, when other countries started threatening to launch their own
> satellites. I think civilian GPS is quite accurate now.
>
> Especially if you're measuring something that isn't moving, and you can
> let the device sit for a day, you can get down in the milimeter range.
>

Atmospheric drift still plays a factor.  You can reference to a known 
nearby location that is receiving GPS signals to help nullify the effect 
as the 'location' will drift as well.

take a look at Wide Area Augmentation System

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:23:58
Message: <4d77d3ce$1@news.povray.org>

On 3/9/2011 12:10 PM, Darren New wrote:
> Invisible wrote:
>> The point is *totally* to have a PK that never changes,
>
> Well, yes, that's the desire. But you can only achieve that by having
> meaningless PKs on everything, which I find to be undesirable if a
> meaningful PK can be found.
>
> If your database is likely to be big enough that an incorrect initial PK
> is going to be catastrophic, then you can go for meaningless PKs.
>
> However, in Tom's case, the GPS coordinates being wrong means
> *everything* is wrong for that pole, so changing the GPS coordinates
> might as well be treated as deleting the pole in the wrong place and
> creating the pole in the right place.
>

Not exactly - we at times have to 'adjust' the coords to achieve better 
accuracy once they are int he system.  It sucks that after you get a lot 
of data your client gives you the 'revised' requirements that cause such 
changes.

>> OK, well I'm gonna have to say they must be doing something more
>> precise than "stand in the general vicinity of the intersection, take
>> down the GPS coordinates, and we'll just use those".
>
> I don't know.
>

look at aerial imagery - see pole - that's the GPS.  Better than 
anything else we have.  GPS units are too slow to get good numbers that 
close.


>
> There's a certain level of error of a couple of meters. You pick a
> standard place to measure from (such as the center of the manhole that
> you're interested in) and call that your location.
>
> When you go back to find the intersection, are the GPS coordinates given
> good enough that someone else can tell what intersection you're talking
> about?
>
> It should be noted, btw, that they're not really interested in the
> intersections, but in the wires running under the roads. The same PK can
> be used to track the wires running under or over not-roads as well.
>

We have had clients require specs that someone in an office without 
experience must have come up with.  Must be withing xx accuracy - must 
add up to +- xx difference.  Possible, but very expensive to actually 
meet the requirements.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:25:58
Message: <4d77d446$1@news.povray.org>

On 3/9/2011 1:30 PM, Darren New wrote:
> Tom Austin wrote:
>> I will stay out of this one.... I think I will stick with auto number
>> fields for now - just to keep my head on straight.
>
> There's nothing *wrong* with it. It's just that many libraries assume
> that every single table has an auto-increment ID, which I think is the
> wrong way to go, simplifying the library at the expense of accurate data.
>

Yep, nothing wrong with it, but not required.  I think it might be that 
it is a 'simple' way of thinking about keys and making them happen.  To 
step out of that box requires more creative thinking and understanding 
of the data than most people enjoy.

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 14:27:21
Message: <4d77d499$1@news.povray.org>

Tom Austin wrote:
> Not exactly - we at times have to 'adjust' the coords to achieve better 
> accuracy once they are int he system.  It sucks that after you get a lot 
> of data your client gives you the 'revised' requirements that cause such 
> changes.

So the GPS is *not* something that uniquely identifies the pole, or the 
customer wouldn't be able to tell you it was wrong. :-) So that shouldn't be 
the PK in this case.

> look at aerial imagery - see pole - that's the GPS.  

That's another way to do it. :-)

> We have had clients require specs that someone in an office without 
> experience must have come up with.  Must be withing xx accuracy - must 
> add up to +- xx difference.  Possible, but very expensive to actually 
> meet the requirements.

Yeah. When it comes time to mark the roads, they still send someone out with 
a metal detector and a can of spray paint.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>