POV-Ray: Newsgroups: povray.off-topic: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions		Server Time 12 Jul 2025 22:40:52 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 13:30:50
Message: <4d77c75a@news.povray.org>

Tom Austin wrote:
> I will stay out of this one....  I think I will stick with auto number 
> fields for now - just to keep my head on straight.

There's nothing *wrong* with it. It's just that many libraries assume that 
every single table has an auto-increment ID, which I think is the wrong way 
to go, simplifying the library at the expense of accurate data.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 13:32:19
Message: <4d77c7b3$1@news.povray.org>

Tom Austin wrote:
> I'm on this page fully.  I prefer to stay away from MS database 
> solutions.  Not that they are not good, but they are just too ... well 
> just 'too'.

I bring it up only because you're running Windows there, based on your 
comments about using Access. I don't know how well MySql and Postgressql 
have been ported to Windows.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 13:43:31
Message: <4d77ca53$1@news.povray.org>

Tom Austin wrote:
> Exactly - but just how much time we can allocate to it is yet to be 
> seen.  But I think I can push for more time now because it will cost 
> much more time later.  But this only works to a point.

Naturally. I'm just recommending that you make it clear that any benefit you 
spend time on now will reap rewards over and over into the future.

> Makes sense - on anything that affects more than 1 row in 1 table is OK. 
>  Transact wrapper everything else :-)

Well, other way around, yes. Anything that affects more than one row goes 
into a transaction.

> Our bigger problem is getting the data that is entered to be correct. 
> Not just structurally correct, but actually correct.  This has to do 
> with the interfaces that were assembled with too much haste.

That's always a bugger, yes. :-)

> True.  I believe that if the system is built right the user will NEVER 
> have to look at the raw data.  The interfaces should give the user all 
> of the information that they need.

Yep. And if you really get into it, you wind up making views for each 
function of your application, so the applications themselves don't need to 
know how the raw data is organized. But that's kind of overkill for a small 
database.

> The keys can be almost anything and it not change the functionality of 
> the database or how the user interacts with.  it does make it somewhat 
> more readable for the person trying to figure out a problems that 
> somehow got into the data.

Right. For one thing, for example, it keeps you from having (say) the same 
pole entered multiple times in the database by mistake. If a pole is 
identified by the GPS coords (assuming these are precise enough) it keeps 
the database from having three records for a pole all at the same location.

> Yes, agreed.  Depending on how quickly the thing was built, it can be 
> very difficult if a complete aspect was missed. 

No, it's not really that hard to add a complete new aspect to the database 
after the fact, if what exists there is well thought out. You just have to 
avoid putting multiple things in the same table that aren't really the same 
thing in reality; like, avoid putting owner contact info into a table of 
poles, or even ownership information into a table of poles.

> Also, one has to define 
> *what* and *how* base rules before adding the data or you get a mess - 
> which we are approaching.

Yep. That's the "C" in ACID. Way too many people who never actually studied 
the topic think that the "C" means an error updating a row will corrupt the 
database files. No, it means an error updating a row will corrupt the 
*data*, not the container.

>>> yes, one just has to pay attention and not set up things that break.
>>> It could be that applications do not rely on the views (or other
>>> non-common features) But then one could be crippling themselves.
>>
>> Yes. It's better to just say "if we need multiple archives of this
>> database, we'll use the same engine."
>>
> 
> I just have to get this through to the rest of the guys.

Or, alternately, accept that you'll use the same engine, and then make plans 
for how to replace its features. Like, "we'll use views, but if we wind up 
exporting to CVS, we'll export both the raw data and the views, which is OK 
since we won't be updating historical data that we've already exported."

> Not a bad idea - but that can get a bit more complicated.  But I will 
> take it as a strong suggestion.

MySql makes it pretty easy, actually. Just follow the instructions.

Basically, you set up two machines with identical configurations of the SQL 
server. Then you tell one "Hey, you're a master", using the tool that takes 
a snapshot of the database and turns on the master flag at the same time. 
Then you restore that snapshot to the slave, then tell the slave "Hey, your 
master is over there!"

As long as you don't update the slave in a way that makes a transaction 
applied to the master fail on the slave, you're good to go. If you 
accidentally insert a record into the slave in a table that has an 
auto-increment integer, the slave will crash out when the master tries to 
insert the record with the same key, and you'll have to re-do that 
backup/restore process.

> We do have off-site backups :-)   Just 500GB in one big tarball tho :-(

Yah. Your actual data is 500G?  That's actually a pretty large database.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:10:01
Message: <4d77d089$1@news.povray.org>

On 3/8/2011 5:11 PM, Darren New wrote:
> Trust me on this: It's the only way you ever get a schema you don't have
> to change with every software update. Record what is "out there", and to
>
> I am looking at a financial system right now where I never overwrite
> data. When someone swipes a card, I put into a table exactly what the
>
> Why? Because the merchant might change merchant accounts before telling
> us. The cardholder might lose the card. Heck, we might even accidentally
> create two cards with the same number.
>
> I'm blathering. Nevermind.
>

you can blather - it is useful info

> Anyway, yes, record everything from *outside* your company in one set of
> tables, as *raw* as you can. Changes to these tables should be
> idempotent. (I.e., you *should* be able to read the CSV file the telco
> gave you listing the pole information multiple times into the same table
> without corruption. This implies no automatically-generated primary key
> on those tables. That said, again, this is just a guideline and not
> always possible.)
>

I see - the primary key thing should prevent the 'error'.
When you state multiple times, you are implying that multiple records 
are NOT being created - just 'updating' existing records after the first 
read.

>
>> this might be worth considering, but I don't know how easy it will be
>> to implement and keep straight. Seems like the a user created key like
>> this could lead to duplicates - but if it is a key it will prevent
>> duplicates anyway - just prompt for another key.
>
> Oh, yes, this isn't for user keys. I meant where in a programming
> language you'd use an enumerated type.
>
> So if you have a "status" row for jobs that says "in negotiation", "paid
> for but waiting for someone to be assigned", "that person is working on
> it", and "ready to be delivered", you could use this sort of coding to
> make your programs more readable. "NEGO", "PAID", "WORK", "DONE" rather
> than 0, 1, 2, 3.
>

yes, enumerations are very friendly - I avoid using plain ol integers 
for this stuff - makes it hard to read

>
> It's hard coming up with that sort of thing when I don't know how your
> business works, tho, but that's the approximate sort of thing I was
> talking about.
>

Yes, there is more to it, but you do a good job of visualizing it.

>
>
> Oh, so the shortcoming is that it was rushed and grew instead of being
> designed. Makes sense. :-)
>

pretty much - the owner didn't want to invest the time to make it better 
and gain the rewards.  new owner - new ideas

>> Putting data in and getting data out was very difficult and required
>> the sacrifice of too many brain cells.
>
> That's exactly where SQL queries between multiple tables will help.
> Thinking in SQL instead of in terms of iterating over things is tough,
> but it's almost always possible to come up with a statement in SQL that
> will pull out all and only the data you need. If you ever get to the
> point where you're saying "for each result from *this* query, submit
> *that* query", you're almost 100% sure you're doing something wrong. In
> my entire career, I only ever did that once, and that was after I'd
> written it the right way and figured out MySql was not optimizing things
> in the obvious way and I didn't know how to make it do so.
>

I think I have run into this already
select A from b where A.ID in ( select ID from C)

we got pretty complicated with poles, connections between poles, cables 
on poles and cables between poles....  a little intertwined

in its simplest form leaving out fluff like owners and such:

table poles:  (each pole)
ID, GPS Location
1 entry for each pole

table connections:  (how poles are connected)
ID, PoleID1, PoleID2, Distance
1 entry for each pole-pole joining (1 for each 'set' of cables)

table attachments:  (cable attached to poles & height)
ID, PoleID, Height
multiple entries per pole
1 for each attachment

table midspans: (lowest height between poles for each cable)
ID, AttachmentID1, AttachmentID2, Height, ConnectionID(redundant)
multiple midspans for each 'connection'
1 midspan per cable over the connection
essentially 1 midspan for each connection on a pair of poles

The rest is pretty much straight forward - objects and properties 
associated with each of the above.

As you can see, there are some circular references going on (attachments 
and midspans connect poles

>
> Let me know if the description of tables above does or doesn't make sense.
>

Makes perfect sense.

To clarify my situation - I tried to present some questions that we have 
in our office without bias to get a true feel of what constitutes good 
and bad practice.

What you have presented falls in line with what I was thinking - but I 
didn't want to present it as "is it stupid to flatten" so that I would 
not bias the question.

You have presented some new ideas that 'extend' my thinking and stretch 
it.  I do not know how much will make it in to the system, but it is 
good food for thought.

We have some views in our group that state that the use of the data 
should drive how it is stored.  I agree with it to a point - mainly for 
ease of working with the data, but not at the sacrifice of usability of 
the data.  The drive is so hard that there is talk about flattening the 
database to make it 'simpler' and that adding more tables makes it more 
complex.

Additionally some of our issue is that when we finish a job for a client 
we do not need to reference the data any more.  Rarely have we found 
ourselves going back to old data.  How much effort should be put into a 
system that supports saving everything and making it accessible when one 
does not refer back to it?

Thanks for your feedback - it is very helpful.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:14:08
Message: <4d77d180$1@news.povray.org>

On 3/9/2011 12:11 PM, Darren New wrote:
> Le_Forgeron wrote:
>> Civil usage of GPS is accurate to 30 yards/meters.
>
> I think they turned off the encryption of the low bits back in Clinton's
> timeframe, when other countries started threatening to launch their own
> satellites. I think civilian GPS is quite accurate now.
>
> Especially if you're measuring something that isn't moving, and you can
> let the device sit for a day, you can get down in the milimeter range.
>

Atmospheric drift still plays a factor.  You can reference to a known 
nearby location that is receiving GPS signals to help nullify the effect 
as the 'location' will drift as well.

take a look at Wide Area Augmentation System

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:23:58
Message: <4d77d3ce$1@news.povray.org>

On 3/9/2011 12:10 PM, Darren New wrote:
> Invisible wrote:
>> The point is *totally* to have a PK that never changes,
>
> Well, yes, that's the desire. But you can only achieve that by having
> meaningless PKs on everything, which I find to be undesirable if a
> meaningful PK can be found.
>
> If your database is likely to be big enough that an incorrect initial PK
> is going to be catastrophic, then you can go for meaningless PKs.
>
> However, in Tom's case, the GPS coordinates being wrong means
> *everything* is wrong for that pole, so changing the GPS coordinates
> might as well be treated as deleting the pole in the wrong place and
> creating the pole in the right place.
>

Not exactly - we at times have to 'adjust' the coords to achieve better 
accuracy once they are int he system.  It sucks that after you get a lot 
of data your client gives you the 'revised' requirements that cause such 
changes.

>> OK, well I'm gonna have to say they must be doing something more
>> precise than "stand in the general vicinity of the intersection, take
>> down the GPS coordinates, and we'll just use those".
>
> I don't know.
>

look at aerial imagery - see pole - that's the GPS.  Better than 
anything else we have.  GPS units are too slow to get good numbers that 
close.


>
> There's a certain level of error of a couple of meters. You pick a
> standard place to measure from (such as the center of the manhole that
> you're interested in) and call that your location.
>
> When you go back to find the intersection, are the GPS coordinates given
> good enough that someone else can tell what intersection you're talking
> about?
>
> It should be noted, btw, that they're not really interested in the
> intersections, but in the wires running under the roads. The same PK can
> be used to track the wires running under or over not-roads as well.
>

We have had clients require specs that someone in an office without 
experience must have come up with.  Must be withing xx accuracy - must 
add up to +- xx difference.  Possible, but very expensive to actually 
meet the requirements.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:25:58
Message: <4d77d446$1@news.povray.org>

On 3/9/2011 1:30 PM, Darren New wrote:
> Tom Austin wrote:
>> I will stay out of this one.... I think I will stick with auto number
>> fields for now - just to keep my head on straight.
>
> There's nothing *wrong* with it. It's just that many libraries assume
> that every single table has an auto-increment ID, which I think is the
> wrong way to go, simplifying the library at the expense of accurate data.
>

Yep, nothing wrong with it, but not required.  I think it might be that 
it is a 'simple' way of thinking about keys and making them happen.  To 
step out of that box requires more creative thinking and understanding 
of the data than most people enjoy.

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 14:27:21
Message: <4d77d499$1@news.povray.org>

Tom Austin wrote:
> Not exactly - we at times have to 'adjust' the coords to achieve better 
> accuracy once they are int he system.  It sucks that after you get a lot 
> of data your client gives you the 'revised' requirements that cause such 
> changes.

So the GPS is *not* something that uniquely identifies the pole, or the 
customer wouldn't be able to tell you it was wrong. :-) So that shouldn't be 
the PK in this case.

> look at aerial imagery - see pole - that's the GPS.  

That's another way to do it. :-)

> We have had clients require specs that someone in an office without 
> experience must have come up with.  Must be withing xx accuracy - must 
> add up to +- xx difference.  Possible, but very expensive to actually 
> meet the requirements.

Yeah. When it comes time to mark the roads, they still send someone out with 
a metal detector and a can of spray paint.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:28:34
Message: <4d77d4e2$1@news.povray.org>

On 3/9/2011 1:32 PM, Darren New wrote:
> Tom Austin wrote:
>> I'm on this page fully. I prefer to stay away from MS database
>> solutions. Not that they are not good, but they are just too ... well
>> just 'too'.
>
> I bring it up only because you're running Windows there, based on your
> comments about using Access. I don't know how well MySql and Postgressql
> have been ported to Windows.
>


Actually we run a mixed bag.

Windows desktops (Autocad and other crap)

Windows domain server (grandfathered in)

Linux from scratch file server (very cool and flexible)

A database server will virtually be 100% linux based.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:41:53
Message: <4d77d801$1@news.povray.org>

On 3/9/2011 1:43 PM, Darren New wrote:
>
> Naturally. I'm just recommending that you make it clear that any benefit
> you spend time on now will reap rewards over and over into the future.
>

The problem is that we have to get jobs out for the client now and might 
not physically have the time to do it right.

>> Makes sense - on anything that affects more than 1 row in 1 table is
>> OK. Transact wrapper everything else :-)
>
> Well, other way around, yes. Anything that affects more than one row
> goes into a transaction.
>

yes - great application of 'not' on a sentence :-)


>> True. I believe that if the system is built right the user will NEVER
>> have to look at the raw data. The interfaces should give the user all
>> of the information that they need.
>
> Yep. And if you really get into it, you wind up making views for each
> function of your application, so the applications themselves don't need
> to know how the raw data is organized. But that's kind of overkill for a
> small database.
>

Yes, but makes the application have to perform less and changes where 
the work is done.

> Right. For one thing, for example, it keeps you from having (say) the
> same pole entered multiple times in the database by mistake. If a pole
> is identified by the GPS coords (assuming these are precise enough) it
> keeps the database from having three records for a pole all at the same
> location.
>

That works if the GPS location used in each instance is exactly the same 
- get off by a bit and you have a duplicate pole 1' away from the 
existing pole.  But a good interface could check for that and prevent it 
from happening.

>> Yes, agreed. Depending on how quickly the thing was built, it can be
>> very difficult if a complete aspect was missed.
>
> No, it's not really that hard to add a complete new aspect to the
> database after the fact, if what exists there is well thought out. You
> just have to avoid putting multiple things in the same table that aren't
> really the same thing in reality; like, avoid putting owner contact info
> into a table of poles, or even ownership information into a table of poles.
>

Yes, you just can add another 'property'.  But not if you miss a 
fundamental requirement that alters the data you already have organized.

>> Also, one has to define *what* and *how* base rules before adding the
>> data or you get a mess - which we are approaching.
>
> Yep. That's the "C" in ACID. Way too many people who never actually
> studied the topic think that the "C" means an error updating a row will
> corrupt the database files. No, it means an error updating a row will
> corrupt the *data*, not the container.
>

Yes, but wrong direction.

I was referring to the situation where a piece of data was not accounted 
for in the database.  You must define ground rules for the database so 
you can define the piece of data and know where it actually should get 
stored.  We have been lacking in such definition.


>
> Or, alternately, accept that you'll use the same engine, and then make
> plans for how to replace its features. Like, "we'll use views, but if we
> wind up exporting to CVS, we'll export both the raw data and the views,
> which is OK since we won't be updating historical data that we've
> already exported."
>

yes, several alternatives can achieve the same outcome.  Our group 
sometimes has trouble seeing that.  And the solution provided can cause 
many many problems.

> MySql makes it pretty easy, actually. Just follow the instructions.
>
> Basically, you set up two machines with identical configurations of the
> SQL server. Then you tell one "Hey, you're a master", using the tool
> that takes a snapshot of the database and turns on the master flag at
> the same time. Then you restore that snapshot to the slave, then tell
> the slave "Hey, your master is over there!"
>
> As long as you don't update the slave in a way that makes a transaction
> applied to the master fail on the slave, you're good to go. If you
> accidentally insert a record into the slave in a table that has an
> auto-increment integer, the slave will crash out when the master tries
> to insert the record with the same key, and you'll have to re-do that
> backup/restore process.
>
>> We do have off-site backups :-) Just 500GB in one big tarball tho :-(
>
> Yah. Your actual data is 500G? That's actually a pretty large database.
>

No, not a single database.  We take many pictures and produce some 
pretty large 3D pointclouds for some of the work.  Each job has a .mdb 
database file, but that is pretty small.


Here's a question:

We have pictures (sometimes more than one) of each pole.  We do have 
some applications that require that the image be in a file format and 
not be fed from a stream.
What is the practicality of storing the images in the DB as a field 
(blob or something else)?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>