POV-Ray: Newsgroups: povray.off-topic: Database Questions

POV-Ray : Newsgroups : povray.off-topic : Database Questions		Server Time 2 Jul 2025 14:39:54 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:28:34
Message: <4d77d4e2$1@news.povray.org>

On 3/9/2011 1:32 PM, Darren New wrote:
> Tom Austin wrote:
>> I'm on this page fully. I prefer to stay away from MS database
>> solutions. Not that they are not good, but they are just too ... well
>> just 'too'.
>
> I bring it up only because you're running Windows there, based on your
> comments about using Access. I don't know how well MySql and Postgressql
> have been ported to Windows.
>


Actually we run a mixed bag.

Windows desktops (Autocad and other crap)

Windows domain server (grandfathered in)

Linux from scratch file server (very cool and flexible)

A database server will virtually be 100% linux based.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:41:53
Message: <4d77d801$1@news.povray.org>

On 3/9/2011 1:43 PM, Darren New wrote:
>
> Naturally. I'm just recommending that you make it clear that any benefit
> you spend time on now will reap rewards over and over into the future.
>

The problem is that we have to get jobs out for the client now and might 
not physically have the time to do it right.

>> Makes sense - on anything that affects more than 1 row in 1 table is
>> OK. Transact wrapper everything else :-)
>
> Well, other way around, yes. Anything that affects more than one row
> goes into a transaction.
>

yes - great application of 'not' on a sentence :-)


>> True. I believe that if the system is built right the user will NEVER
>> have to look at the raw data. The interfaces should give the user all
>> of the information that they need.
>
> Yep. And if you really get into it, you wind up making views for each
> function of your application, so the applications themselves don't need
> to know how the raw data is organized. But that's kind of overkill for a
> small database.
>

Yes, but makes the application have to perform less and changes where 
the work is done.

> Right. For one thing, for example, it keeps you from having (say) the
> same pole entered multiple times in the database by mistake. If a pole
> is identified by the GPS coords (assuming these are precise enough) it
> keeps the database from having three records for a pole all at the same
> location.
>

That works if the GPS location used in each instance is exactly the same 
- get off by a bit and you have a duplicate pole 1' away from the 
existing pole.  But a good interface could check for that and prevent it 
from happening.

>> Yes, agreed. Depending on how quickly the thing was built, it can be
>> very difficult if a complete aspect was missed.
>
> No, it's not really that hard to add a complete new aspect to the
> database after the fact, if what exists there is well thought out. You
> just have to avoid putting multiple things in the same table that aren't
> really the same thing in reality; like, avoid putting owner contact info
> into a table of poles, or even ownership information into a table of poles.
>

Yes, you just can add another 'property'.  But not if you miss a 
fundamental requirement that alters the data you already have organized.

>> Also, one has to define *what* and *how* base rules before adding the
>> data or you get a mess - which we are approaching.
>
> Yep. That's the "C" in ACID. Way too many people who never actually
> studied the topic think that the "C" means an error updating a row will
> corrupt the database files. No, it means an error updating a row will
> corrupt the *data*, not the container.
>

Yes, but wrong direction.

I was referring to the situation where a piece of data was not accounted 
for in the database.  You must define ground rules for the database so 
you can define the piece of data and know where it actually should get 
stored.  We have been lacking in such definition.


>
> Or, alternately, accept that you'll use the same engine, and then make
> plans for how to replace its features. Like, "we'll use views, but if we
> wind up exporting to CVS, we'll export both the raw data and the views,
> which is OK since we won't be updating historical data that we've
> already exported."
>

yes, several alternatives can achieve the same outcome.  Our group 
sometimes has trouble seeing that.  And the solution provided can cause 
many many problems.

> MySql makes it pretty easy, actually. Just follow the instructions.
>
> Basically, you set up two machines with identical configurations of the
> SQL server. Then you tell one "Hey, you're a master", using the tool
> that takes a snapshot of the database and turns on the master flag at
> the same time. Then you restore that snapshot to the slave, then tell
> the slave "Hey, your master is over there!"
>
> As long as you don't update the slave in a way that makes a transaction
> applied to the master fail on the slave, you're good to go. If you
> accidentally insert a record into the slave in a table that has an
> auto-increment integer, the slave will crash out when the master tries
> to insert the record with the same key, and you'll have to re-do that
> backup/restore process.
>
>> We do have off-site backups :-) Just 500GB in one big tarball tho :-(
>
> Yah. Your actual data is 500G? That's actually a pretty large database.
>

No, not a single database.  We take many pictures and produce some 
pretty large 3D pointclouds for some of the work.  Each job has a .mdb 
database file, but that is pretty small.


Here's a question:

We have pictures (sometimes more than one) of each pole.  We do have 
some applications that require that the image be in a file format and 
not be fed from a stream.
What is the practicality of storing the images in the DB as a field 
(blob or something else)?

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 14:43:10
Message: <4d77d84e$1@news.povray.org>

On 3/9/2011 2:27 PM, Darren New wrote:
> Tom Austin wrote:
>> Not exactly - we at times have to 'adjust' the coords to achieve
>> better accuracy once they are int he system. It sucks that after you
>> get a lot of data your client gives you the 'revised' requirements
>> that cause such changes.
>
> So the GPS is *not* something that uniquely identifies the pole, or the
> customer wouldn't be able to tell you it was wrong. :-) So that
> shouldn't be the PK in this case.
>
>> look at aerial imagery - see pole - that's the GPS.
>
> That's another way to do it. :-)
>
>> We have had clients require specs that someone in an office without
>> experience must have come up with. Must be withing xx accuracy - must
>> add up to +- xx difference. Possible, but very expensive to actually
>> meet the requirements.
>
> Yeah. When it comes time to mark the roads, they still send someone out
> with a metal detector and a can of spray paint.
>

I've used dowsing rods for the task before :-)

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 14:57:42
Message: <4d77dbb6@news.povray.org>

Tom Austin wrote:
> I see - the primary key thing should prevent the 'error'.

Yeah. OK, imagine a database of oh receipts, say.

If the primary key is some arbitrary integer, it's possible to get the same 
receipt into the table more than once, which will screw up your accounting.

If the primary key is the cash register number + timestamp to the second, 
the only way you get a collision is to make two sales on the same register 
within one second of each other, which probably *should* throw some sort of 
error. It also makes it easy to print on the paper receipt all you need to 
know to get the exact record.

Whether you can do this sort of thing with any particular table of yours is 
another question.

> When you state multiple times, you are implying that multiple records 
> are NOT being created - just 'updating' existing records after the first 
> read.

Or not updating them at all, if you pass in the same data twice by mistake.

Look at a CSV file full of receipts like above. Reading each row is going to 
clobber the record already there. If you use an auto-increment ID, you'll 
wind up with 200 reciepts if you read a 100-row CSV in twice.

IME, a large part of database fuckage is either operator error (like loading 
stuff twice) or simple programming errors (like debiting everyone's account 
instead of just the account of the person you should have debited). The idea 
of making idempotent updates and keeping write-once historical records came 
about because that makes these things easy to fix.

> pretty much - the owner didn't want to invest the time to make it better 
> and gain the rewards.  new owner - new ideas

Cool. Let's hope it stays that way. :-)

> I think I have run into this already
> select A from b where A.ID in ( select ID from C)

No, that's cool. Well, not that exact query, but that sort of thing. The 
trick is it's all in SQL. You aren't looping inside the PHP code or 
Javascript code or whatever you're using to submit that code to the database 
engine.

Since it's all in SQL, the SQL engine can optimize that.

> we got pretty complicated with poles, connections between poles, cables 
> on poles and cables between poles....  a little intertwined

Oh, I can imagine. :-) But there are actually formal mathematical processes 
and rules for designing a database like that in a way that makes it easier.

> in its simplest form leaving out fluff like owners and such:
> 
> table poles:  (each pole)
> ID, GPS Location
> 1 entry for each pole
> 
> table connections:  (how poles are connected)
> ID, PoleID1, PoleID2, Distance
> 1 entry for each pole-pole joining (1 for each 'set' of cables)

See, this is what I mean. There's no need for an ID on this table. You don't 
want two rows, one where it's

(27, Pole 3, Pole 4, 500 feet)
(29, Pole 3, Pole 4, 800 feet)

You get that, your data is screwed, and you know it, but you can't fix it.

A connection between two poles is defined by the two poles it's connecting. 
The distance and what types of cables go through are dependent data.

table connections:
PoleID1, PoleID2, Distance (pk=PoleID1+PokeID2)

table cables:
PoleID1, PoleID2, height, cable type. (pk=PoleID1+PoleID2+cable type)

> table attachments:  (cable attached to poles & height)
> ID, PoleID, Height
> multiple entries per pole
> 1 for each attachment

Is there a row in this table for a pole that has no cables attached? I.e., 
does this represent "Cables attached to this pole" or "places where it's 
posslbe to attach a cable"?

> table midspans: (lowest height between poles for each cable)
> ID, AttachmentID1, AttachmentID2, Height, ConnectionID(redundant)
> multiple midspans for each 'connection'
> 1 midspan per cable over the connection
> essentially 1 midspan for each connection on a pair of poles

Consider a slightly different layout: a span.

Cable type, poleID1, attachmenthieght1, poleID2, attachmentheight2, 
midspanheight.

Then poleid1+poleid2+cabletype becomes your primary key.

That gives you poles, distances between poles, and spans of cable. If you 
need to know connectivity, you could have a table that lists the first pole 
and last pole along with a collection of spans.

I'm not saying you should do it this way. I'm just pointing out there are 
lots of ways to organize the data, and you might want to think of a drastic 
simplification in the structure that will still give you everything you need 
with a bit of SQL processing time.

What you want is a structure where it becomes impossible to have only some 
of the information you need. Just like you couldn't have a GPS location 
without a pole, you don't want to have (say) a connection height on a pole 
without a mispan, or a connection height on one pole but not on the other pole.

> The rest is pretty much straight forward - objects and properties 
> associated with each of the above.
> 
> As you can see, there are some circular references going on (attachments 
> and midspans connect poles

Yeah. That means you're doing it wrong, if it's actually *circular*.

It's often easier to draw stuff as pictures on paper when you're designing 
things.

Draw a box for each table, and an arrow each time there's a foreign key.

Any box with no outgoing arrows is a fundamental real-world entity. In this 
case, for example, the poles better be in the real world, because they're 
not attached to anything else.

> You have presented some new ideas that 'extend' my thinking and stretch 
> it.  I do not know how much will make it in to the system, but it is 
> good food for thought.

It takes practice. I'm just trying to explain how I initially approach 
things, and why. Of course each situation varies.

> We have some views in our group that state that the use of the data 
> should drive how it is stored. 

This is exactly the *opposite* of what an RDBMS is all about. People already 
had this kind of database, and the problem was that the data became unusable 
when new requirements came up. That's why SQL has no pointers.

> I agree with it to a point - mainly for 
> ease of working with the data, but not at the sacrifice of usability of 
> the data.  The drive is so hard that there is talk about flattening the 
> database to make it 'simpler' and that adding more tables makes it more 
> complex.

Add lots of tables. Create "simpler" tables by creating views. When you find 
in six months that you need new functionality, create other views that 
expose the data organized in the new way.

> Additionally some of our issue is that when we finish a job for a client 
> we do not need to reference the data any more.  Rarely have we found 
> ourselves going back to old data.  How much effort should be put into a 
> system that supports saving everything and making it accessible when one 
> does not refer back to it?

Another good reason to avoid using auto-increment IDs. :-) You can have a 
"job" table that gives you top-level pointers for a job to each thing 
involved in a job. (I.e., if a Job consisted of a bunch of connections on a 
bunch of poles, you'd have a Job ID and a collection of poles. The 
collection of poles would give you the collection of cables. Etc.) You could 
then write out (as SQL even) all the rows associated with a given job, then 
delete those rows. If you ever needed them back, run that SQL back into the 
database.  (Altho, honestly, most database engines with autoincrement IDs 
can handle actually inserting records with a given ID and not generating a 
new one.)

But honestly, I don't know how big your database is, but a database can 
store millions of rows before you even start to see anything getting slow. 
It's not a case of "do we need to keep it around" as it is "do we need to 
throw it away?"  But if you manage to build the data in a way that each 
chunk you might want to archive (i.e., a job) has a record that lets you get 
the whole collection of everything associated with that record, then you 
don't have to worry about the actual archive-and-delete part of the code 
until there's a reason to. Then you can write that code, confident that you 
know what you need to store out.

> Thanks for your feedback - it is very helpful.

I like helping,b ecause writing the thoughts out clearly enough to explain 
to someone else always clarifies things to me too.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 15:00:16
Message: <4d77dc50$1@news.povray.org>

Tom Austin wrote:
> A database server will virtually be 100% linux based.

Ah. MySql or Postrgresql might be the way to go, then, obviously. You'd have 
to do the research on how to connect the Linux server to the Windows desktop 
software, then. I'm not sure I've done that recently enough for any advice 
I'd give to be valid.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 15:14:26
Message: <4d77dfa2$1@news.povray.org>

Tom Austin wrote:
> The problem is that we have to get jobs out for the client now and might 
> not physically have the time to do it right.

I grok. Why can't you use the old system, tho.

> Yes, but makes the application have to perform less and changes where 
> the work is done.

The real advantage is you can change what's stored underneath (like, say, 
changing the primary keys, or changing whether you talk about "attachments" 
or "spans") without updating the application. You rearrange the database, 
rewrite the views, but all your calculation code etc stays the same.

> That works if the GPS location used in each instance is exactly the same 
> - get off by a bit and you have a duplicate pole 1' away from the 
> existing pole.  But a good interface could check for that and prevent it 
> from happening.

Right. That was the "precise enough" bit. :-)

> Yes, you just can add another 'property'.  But not if you miss a 
> fundamental requirement that alters the data you already have organized.

The advantage of basing the schema on the "real world" is that if you miss a 
fundamental property, it's because the data isn't available.

> I was referring to the situation where a piece of data was not accounted 
> for in the database.  You must define ground rules for the database so 
> you can define the piece of data and know where it actually should get 
> stored.  We have been lacking in such definition.

Well, yes, it's hard to store data you don't know about. :-)

I was simply talking about the addition of new types of data after you 
already have the existing data defined.

> yes, several alternatives can achieve the same outcome.  Our group 
> sometimes has trouble seeing that.  

Ah, welcome to the club. ;-)  That's why I like being boss.

> What is the practicality of storing the images in the DB as a field 
> (blob or something else)?

Well, it'll certainly make things slow. If you want to store them in a 
database rather than the file system, *this* is a perfect use case for 
having a second database. :-) Make them blobs in the second database, and 
refer to them from the first database.

The benefits of doing this are that you get easy access to the pictures from 
the same place you have any of the other data (i.e., you don't have to 
figure out how to set up network shares, your login information can control 
access to the pictures, etc.). But you also have the advantage that you 
don't have to back up the database as often, you can put it on a different 
partition or a different server, etc.  (For example, say the customer FTPs 
the pictures to you. You could pull the pictures out of the incoming 
directory and put them into the PolePicture blob database when you created 
the pole record in the primary database that references the picture. Then 
you could leave the pictures on the FTP server until you back up the blob 
database at the end of the week/month/whatever, whereas you'd probably want 
to be replicating the actual transactional database live.)

So generally that's what I'd recommend. Be sure to write the code to fetch 
the pictures modularly, so you can move it to an entirely different server 
with an entirely different connection string if it gets too big or something.

And read up on the technical details of how blobs are handled in storage, so 
you don't wind up configuring yourself into a corner or something. :-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 16:48:57
Message: <4d77f5c9@news.povray.org>

Invisible wrote:
> The point is *totally* to have a PK that never changes, since changing a 
> PK is a tricky business which you want to avoid. (Duplicate PKs would of 
> course be much, much worse, but that's fairly easy to avoid.)

Actually, I think this is backwards. The trick would be to pick a PK that 
never changes because it actually uniquely identifies the entity the record 
represents (which, as I have said, is not always possible). The problem is 
not to avoid duplicate PKs, but to avoid two records with different PKs 
representing the same thing. If you already have a "unique index" on a table 
for fields you don't anticipate changing, that can easily serve as the PK 
for that table without having yet another arbitrary identifier.

By giving each record a PK of some random number, you have eliminated the 
ability to distinguish duplicates. Actually updating a PK on a record, while 
possibly time-consuming, shouldn't be problematic from a theoretical point 
of view. That's precisely what foreign keys, cascading updates, and 
cascading deletes are for.

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 16:59:57
Message: <4d77f85d$1@news.povray.org>

On 3/9/2011 3:00 PM, Darren New wrote:
> Tom Austin wrote:
>> A database server will virtually be 100% linux based.
>
> Ah. MySql or Postrgresql might be the way to go, then, obviously. You'd
> have to do the research on how to connect the Linux server to the
> Windows desktop software, then. I'm not sure I've done that recently
> enough for any advice I'd give to be valid.
>

I'm actually not too afraid on how to do that.  Shouldn't be much 
different than for a SQL Server or anything else.  Just have to get it 
set up and then you can run.

Post a reply to this message

From: Tom Austin
Subject: Re: Database Questions
Date: 9 Mar 2011 17:06:55
Message: <4d77f9ff$1@news.povray.org>

On 3/9/2011 3:14 PM, Darren New wrote:
> Tom Austin wrote:
>> The problem is that we have to get jobs out for the client now and
>> might not physically have the time to do it right.
>
> I grok. Why can't you use the old system, tho.
>

A desire to just get rid of it and not try to morph it.
It is one of those things - until you grasp how broken it was it doesn't 
seem logical.

>> Yes, but makes the application have to perform less and changes where
>> the work is done.
>
> The real advantage is you can change what's stored underneath (like,
> say, changing the primary keys, or changing whether you talk about
> "attachments" or "spans") without updating the application. You
> rearrange the database, rewrite the views, but all your calculation code
> etc stays the same.
>

that sounds like fun :-)

I like the idea - now to try to sell it....

>> I was referring to the situation where a piece of data was not
>> accounted for in the database. You must define ground rules for the
>> database so you can define the piece of data and know where it
>> actually should get stored. We have been lacking in such definition.
>
> Well, yes, it's hard to store data you don't know about. :-)
>
> I was simply talking about the addition of new types of data after you
> already have the existing data defined.
>

yes, that can work well.  But it good to have rules as to how to define 
the new data so that the db is not peppered with ad-hocs.

Like - does this belong with make ready or the current state of the pole....


>
> Ah, welcome to the club. ;-) That's why I like being boss.
>

can be difficult if you someone in the group has a strong personality.

>> What is the practicality of storing the images in the DB as a field
>> (blob or something else)?
>
> Well, it'll certainly make things slow. If you want to store them in a
> database rather than the file system, *this* is a perfect use case for
> having a second database. :-) Make them blobs in the second database,
> and refer to them from the first database.
>
> The benefits of doing this are that you get easy access to the pictures
> from the same place you have any of the other data (i.e., you don't have
> to figure out how to set up network shares, your login information can
> control access to the pictures, etc.). But you also have the advantage
> that you don't have to back up the database as often, you can put it on
> a different partition or a different server, etc. (For example, say the
> customer FTPs the pictures to you. You could pull the pictures out of
> the incoming directory and put them into the PolePicture blob database
> when you created the pole record in the primary database that references
> the picture. Then you could leave the pictures on the FTP server until
> you back up the blob database at the end of the week/month/whatever,
> whereas you'd probably want to be replicating the actual transactional
> database live.)
>
> So generally that's what I'd recommend. Be sure to write the code to
> fetch the pictures modularly, so you can move it to an entirely
> different server with an entirely different connection string if it gets
> too big or something.
>
> And read up on the technical details of how blobs are handled in
> storage, so you don't wind up configuring yourself into a corner or
> something. :-)
>

Thanks for the tips - for the image files I don't know if it will go to 
blobs or stay files simply because of making working with the files more 
complex.  But leaving the files as is adds complexity as well.  <sigh>

Post a reply to this message

From: Darren New
Subject: Re: Database Questions
Date: 9 Mar 2011 18:34:59
Message: <4d780ea3$1@news.povray.org>

Tom Austin wrote:
> On 3/9/2011 3:14 PM, Darren New wrote:
>> Tom Austin wrote:
>>> The problem is that we have to get jobs out for the client now and
>>> might not physically have the time to do it right.
>>
>> I grok. Why can't you use the old system, tho.
>>
> 
> A desire to just get rid of it and not try to morph it.
> It is one of those things - until you grasp how broken it was it doesn't 
> seem logical.

No, I meant your business is going to be making money running the old system 
while you're implementing the new system. Why is there a hurry beyond "we're 
spending money on peoples time to build the new system"?  If you turned the 
old system off before the new system was ready, I can see the hurry.

Doing it right shouldn't take a whole lot longer than doing it wrong.

> Like - does this belong with make ready or the current state of the 
> pole....

Right. You do have to watch out for that sort of thing, yes. But again, fall 
back to the concept that each row (in a "primary" table) represents a real 
entity.  If it's something artificial that is related to but not *really* a 
part of the thing the row represents, it should go in a separate table keyed 
to the main table.

Like, if you have a bunch of poles, and you want to know where to put a date 
at which the utility thinks it'll finish installing the cabling, that's not 
part of the utility *or* part of the pole.  So you can make a new table 
"ExpectedInstall" that has nothing but the pole and the date. Or whatever 
the key is you wind up hooking it to. Even if you have only one row in that 
table for each pole, that keeps the database clean, as well as making it 
obvious everywhere you actually use that data later (in that everywhere you 
actually use it, you'll be joining against the table).

>> Ah, welcome to the club. ;-) That's why I like being boss.
>>
> 
> can be difficult if you someone in the group has a strong personality.

As long as they're not stubborn in the face of reason. :-)

> Thanks for the tips - for the image files I don't know if it will go to 
> blobs or stay files simply because of making working with the files more 
> complex.  But leaving the files as is adds complexity as well.  <sigh>

Yeah, the main reason would be to unify managing the pictures with managing 
the rest of the data. If unifying the pictures makes it more complex to 
manage rather than simpler, it doesn't make sense to do that.  Make sure you 
make that bit modular in your code, and you shouldn't have a problem. :-)

-- 
Darren New, San Diego CA, USA (PST)
  "How did he die?"   "He got shot in the hand."
     "That was fatal?"
          "He was holding a live grenade at the time."

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>