 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 3/9/2011 2:27 PM, Darren New wrote:
> Tom Austin wrote:
>> Not exactly - we at times have to 'adjust' the coords to achieve
>> better accuracy once they are int he system. It sucks that after you
>> get a lot of data your client gives you the 'revised' requirements
>> that cause such changes.
>
> So the GPS is *not* something that uniquely identifies the pole, or the
> customer wouldn't be able to tell you it was wrong. :-) So that
> shouldn't be the PK in this case.
>
>> look at aerial imagery - see pole - that's the GPS.
>
> That's another way to do it. :-)
>
>> We have had clients require specs that someone in an office without
>> experience must have come up with. Must be withing xx accuracy - must
>> add up to +- xx difference. Possible, but very expensive to actually
>> meet the requirements.
>
> Yeah. When it comes time to mark the roads, they still send someone out
> with a metal detector and a can of spray paint.
>
I've used dowsing rods for the task before :-)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> I see - the primary key thing should prevent the 'error'.
Yeah. OK, imagine a database of oh receipts, say.
If the primary key is some arbitrary integer, it's possible to get the same
receipt into the table more than once, which will screw up your accounting.
If the primary key is the cash register number + timestamp to the second,
the only way you get a collision is to make two sales on the same register
within one second of each other, which probably *should* throw some sort of
error. It also makes it easy to print on the paper receipt all you need to
know to get the exact record.
Whether you can do this sort of thing with any particular table of yours is
another question.
> When you state multiple times, you are implying that multiple records
> are NOT being created - just 'updating' existing records after the first
> read.
Or not updating them at all, if you pass in the same data twice by mistake.
Look at a CSV file full of receipts like above. Reading each row is going to
clobber the record already there. If you use an auto-increment ID, you'll
wind up with 200 reciepts if you read a 100-row CSV in twice.
IME, a large part of database fuckage is either operator error (like loading
stuff twice) or simple programming errors (like debiting everyone's account
instead of just the account of the person you should have debited). The idea
of making idempotent updates and keeping write-once historical records came
about because that makes these things easy to fix.
> pretty much - the owner didn't want to invest the time to make it better
> and gain the rewards. new owner - new ideas
Cool. Let's hope it stays that way. :-)
> I think I have run into this already
> select A from b where A.ID in ( select ID from C)
No, that's cool. Well, not that exact query, but that sort of thing. The
trick is it's all in SQL. You aren't looping inside the PHP code or
Javascript code or whatever you're using to submit that code to the database
engine.
Since it's all in SQL, the SQL engine can optimize that.
> we got pretty complicated with poles, connections between poles, cables
> on poles and cables between poles.... a little intertwined
Oh, I can imagine. :-) But there are actually formal mathematical processes
and rules for designing a database like that in a way that makes it easier.
> in its simplest form leaving out fluff like owners and such:
>
> table poles: (each pole)
> ID, GPS Location
> 1 entry for each pole
>
> table connections: (how poles are connected)
> ID, PoleID1, PoleID2, Distance
> 1 entry for each pole-pole joining (1 for each 'set' of cables)
See, this is what I mean. There's no need for an ID on this table. You don't
want two rows, one where it's
(27, Pole 3, Pole 4, 500 feet)
(29, Pole 3, Pole 4, 800 feet)
You get that, your data is screwed, and you know it, but you can't fix it.
A connection between two poles is defined by the two poles it's connecting.
The distance and what types of cables go through are dependent data.
table connections:
PoleID1, PoleID2, Distance (pk=PoleID1+PokeID2)
table cables:
PoleID1, PoleID2, height, cable type. (pk=PoleID1+PoleID2+cable type)
> table attachments: (cable attached to poles & height)
> ID, PoleID, Height
> multiple entries per pole
> 1 for each attachment
Is there a row in this table for a pole that has no cables attached? I.e.,
does this represent "Cables attached to this pole" or "places where it's
posslbe to attach a cable"?
> table midspans: (lowest height between poles for each cable)
> ID, AttachmentID1, AttachmentID2, Height, ConnectionID(redundant)
> multiple midspans for each 'connection'
> 1 midspan per cable over the connection
> essentially 1 midspan for each connection on a pair of poles
Consider a slightly different layout: a span.
Cable type, poleID1, attachmenthieght1, poleID2, attachmentheight2,
midspanheight.
Then poleid1+poleid2+cabletype becomes your primary key.
That gives you poles, distances between poles, and spans of cable. If you
need to know connectivity, you could have a table that lists the first pole
and last pole along with a collection of spans.
I'm not saying you should do it this way. I'm just pointing out there are
lots of ways to organize the data, and you might want to think of a drastic
simplification in the structure that will still give you everything you need
with a bit of SQL processing time.
What you want is a structure where it becomes impossible to have only some
of the information you need. Just like you couldn't have a GPS location
without a pole, you don't want to have (say) a connection height on a pole
without a mispan, or a connection height on one pole but not on the other pole.
> The rest is pretty much straight forward - objects and properties
> associated with each of the above.
>
> As you can see, there are some circular references going on (attachments
> and midspans connect poles
Yeah. That means you're doing it wrong, if it's actually *circular*.
It's often easier to draw stuff as pictures on paper when you're designing
things.
Draw a box for each table, and an arrow each time there's a foreign key.
Any box with no outgoing arrows is a fundamental real-world entity. In this
case, for example, the poles better be in the real world, because they're
not attached to anything else.
> You have presented some new ideas that 'extend' my thinking and stretch
> it. I do not know how much will make it in to the system, but it is
> good food for thought.
It takes practice. I'm just trying to explain how I initially approach
things, and why. Of course each situation varies.
> We have some views in our group that state that the use of the data
> should drive how it is stored.
This is exactly the *opposite* of what an RDBMS is all about. People already
had this kind of database, and the problem was that the data became unusable
when new requirements came up. That's why SQL has no pointers.
> I agree with it to a point - mainly for
> ease of working with the data, but not at the sacrifice of usability of
> the data. The drive is so hard that there is talk about flattening the
> database to make it 'simpler' and that adding more tables makes it more
> complex.
Add lots of tables. Create "simpler" tables by creating views. When you find
in six months that you need new functionality, create other views that
expose the data organized in the new way.
> Additionally some of our issue is that when we finish a job for a client
> we do not need to reference the data any more. Rarely have we found
> ourselves going back to old data. How much effort should be put into a
> system that supports saving everything and making it accessible when one
> does not refer back to it?
Another good reason to avoid using auto-increment IDs. :-) You can have a
"job" table that gives you top-level pointers for a job to each thing
involved in a job. (I.e., if a Job consisted of a bunch of connections on a
bunch of poles, you'd have a Job ID and a collection of poles. The
collection of poles would give you the collection of cables. Etc.) You could
then write out (as SQL even) all the rows associated with a given job, then
delete those rows. If you ever needed them back, run that SQL back into the
database. (Altho, honestly, most database engines with autoincrement IDs
can handle actually inserting records with a given ID and not generating a
new one.)
But honestly, I don't know how big your database is, but a database can
store millions of rows before you even start to see anything getting slow.
It's not a case of "do we need to keep it around" as it is "do we need to
throw it away?" But if you manage to build the data in a way that each
chunk you might want to archive (i.e., a job) has a record that lets you get
the whole collection of everything associated with that record, then you
don't have to worry about the actual archive-and-delete part of the code
until there's a reason to. Then you can write that code, confident that you
know what you need to store out.
> Thanks for your feedback - it is very helpful.
I like helping,b ecause writing the thoughts out clearly enough to explain
to someone else always clarifies things to me too.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> A database server will virtually be 100% linux based.
Ah. MySql or Postrgresql might be the way to go, then, obviously. You'd have
to do the research on how to connect the Linux server to the Windows desktop
software, then. I'm not sure I've done that recently enough for any advice
I'd give to be valid.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> The problem is that we have to get jobs out for the client now and might
> not physically have the time to do it right.
I grok. Why can't you use the old system, tho.
> Yes, but makes the application have to perform less and changes where
> the work is done.
The real advantage is you can change what's stored underneath (like, say,
changing the primary keys, or changing whether you talk about "attachments"
or "spans") without updating the application. You rearrange the database,
rewrite the views, but all your calculation code etc stays the same.
> That works if the GPS location used in each instance is exactly the same
> - get off by a bit and you have a duplicate pole 1' away from the
> existing pole. But a good interface could check for that and prevent it
> from happening.
Right. That was the "precise enough" bit. :-)
> Yes, you just can add another 'property'. But not if you miss a
> fundamental requirement that alters the data you already have organized.
The advantage of basing the schema on the "real world" is that if you miss a
fundamental property, it's because the data isn't available.
> I was referring to the situation where a piece of data was not accounted
> for in the database. You must define ground rules for the database so
> you can define the piece of data and know where it actually should get
> stored. We have been lacking in such definition.
Well, yes, it's hard to store data you don't know about. :-)
I was simply talking about the addition of new types of data after you
already have the existing data defined.
> yes, several alternatives can achieve the same outcome. Our group
> sometimes has trouble seeing that.
Ah, welcome to the club. ;-) That's why I like being boss.
> What is the practicality of storing the images in the DB as a field
> (blob or something else)?
Well, it'll certainly make things slow. If you want to store them in a
database rather than the file system, *this* is a perfect use case for
having a second database. :-) Make them blobs in the second database, and
refer to them from the first database.
The benefits of doing this are that you get easy access to the pictures from
the same place you have any of the other data (i.e., you don't have to
figure out how to set up network shares, your login information can control
access to the pictures, etc.). But you also have the advantage that you
don't have to back up the database as often, you can put it on a different
partition or a different server, etc. (For example, say the customer FTPs
the pictures to you. You could pull the pictures out of the incoming
directory and put them into the PolePicture blob database when you created
the pole record in the primary database that references the picture. Then
you could leave the pictures on the FTP server until you back up the blob
database at the end of the week/month/whatever, whereas you'd probably want
to be replicating the actual transactional database live.)
So generally that's what I'd recommend. Be sure to write the code to fetch
the pictures modularly, so you can move it to an entirely different server
with an entirely different connection string if it gets too big or something.
And read up on the technical details of how blobs are handled in storage, so
you don't wind up configuring yourself into a corner or something. :-)
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> The point is *totally* to have a PK that never changes, since changing a
> PK is a tricky business which you want to avoid. (Duplicate PKs would of
> course be much, much worse, but that's fairly easy to avoid.)
Actually, I think this is backwards. The trick would be to pick a PK that
never changes because it actually uniquely identifies the entity the record
represents (which, as I have said, is not always possible). The problem is
not to avoid duplicate PKs, but to avoid two records with different PKs
representing the same thing. If you already have a "unique index" on a table
for fields you don't anticipate changing, that can easily serve as the PK
for that table without having yet another arbitrary identifier.
By giving each record a PK of some random number, you have eliminated the
ability to distinguish duplicates. Actually updating a PK on a record, while
possibly time-consuming, shouldn't be problematic from a theoretical point
of view. That's precisely what foreign keys, cascading updates, and
cascading deletes are for.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 3/9/2011 3:00 PM, Darren New wrote:
> Tom Austin wrote:
>> A database server will virtually be 100% linux based.
>
> Ah. MySql or Postrgresql might be the way to go, then, obviously. You'd
> have to do the research on how to connect the Linux server to the
> Windows desktop software, then. I'm not sure I've done that recently
> enough for any advice I'd give to be valid.
>
I'm actually not too afraid on how to do that. Shouldn't be much
different than for a SQL Server or anything else. Just have to get it
set up and then you can run.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 3/9/2011 3:14 PM, Darren New wrote:
> Tom Austin wrote:
>> The problem is that we have to get jobs out for the client now and
>> might not physically have the time to do it right.
>
> I grok. Why can't you use the old system, tho.
>
A desire to just get rid of it and not try to morph it.
It is one of those things - until you grasp how broken it was it doesn't
seem logical.
>> Yes, but makes the application have to perform less and changes where
>> the work is done.
>
> The real advantage is you can change what's stored underneath (like,
> say, changing the primary keys, or changing whether you talk about
> "attachments" or "spans") without updating the application. You
> rearrange the database, rewrite the views, but all your calculation code
> etc stays the same.
>
that sounds like fun :-)
I like the idea - now to try to sell it....
>> I was referring to the situation where a piece of data was not
>> accounted for in the database. You must define ground rules for the
>> database so you can define the piece of data and know where it
>> actually should get stored. We have been lacking in such definition.
>
> Well, yes, it's hard to store data you don't know about. :-)
>
> I was simply talking about the addition of new types of data after you
> already have the existing data defined.
>
yes, that can work well. But it good to have rules as to how to define
the new data so that the db is not peppered with ad-hocs.
Like - does this belong with make ready or the current state of the pole....
>
> Ah, welcome to the club. ;-) That's why I like being boss.
>
can be difficult if you someone in the group has a strong personality.
>> What is the practicality of storing the images in the DB as a field
>> (blob or something else)?
>
> Well, it'll certainly make things slow. If you want to store them in a
> database rather than the file system, *this* is a perfect use case for
> having a second database. :-) Make them blobs in the second database,
> and refer to them from the first database.
>
> The benefits of doing this are that you get easy access to the pictures
> from the same place you have any of the other data (i.e., you don't have
> to figure out how to set up network shares, your login information can
> control access to the pictures, etc.). But you also have the advantage
> that you don't have to back up the database as often, you can put it on
> a different partition or a different server, etc. (For example, say the
> customer FTPs the pictures to you. You could pull the pictures out of
> the incoming directory and put them into the PolePicture blob database
> when you created the pole record in the primary database that references
> the picture. Then you could leave the pictures on the FTP server until
> you back up the blob database at the end of the week/month/whatever,
> whereas you'd probably want to be replicating the actual transactional
> database live.)
>
> So generally that's what I'd recommend. Be sure to write the code to
> fetch the pictures modularly, so you can move it to an entirely
> different server with an entirely different connection string if it gets
> too big or something.
>
> And read up on the technical details of how blobs are handled in
> storage, so you don't wind up configuring yourself into a corner or
> something. :-)
>
Thanks for the tips - for the image files I don't know if it will go to
blobs or stay files simply because of making working with the files more
complex. But leaving the files as is adds complexity as well. <sigh>
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> On 3/9/2011 3:14 PM, Darren New wrote:
>> Tom Austin wrote:
>>> The problem is that we have to get jobs out for the client now and
>>> might not physically have the time to do it right.
>>
>> I grok. Why can't you use the old system, tho.
>>
>
> A desire to just get rid of it and not try to morph it.
> It is one of those things - until you grasp how broken it was it doesn't
> seem logical.
No, I meant your business is going to be making money running the old system
while you're implementing the new system. Why is there a hurry beyond "we're
spending money on peoples time to build the new system"? If you turned the
old system off before the new system was ready, I can see the hurry.
Doing it right shouldn't take a whole lot longer than doing it wrong.
> Like - does this belong with make ready or the current state of the
> pole....
Right. You do have to watch out for that sort of thing, yes. But again, fall
back to the concept that each row (in a "primary" table) represents a real
entity. If it's something artificial that is related to but not *really* a
part of the thing the row represents, it should go in a separate table keyed
to the main table.
Like, if you have a bunch of poles, and you want to know where to put a date
at which the utility thinks it'll finish installing the cabling, that's not
part of the utility *or* part of the pole. So you can make a new table
"ExpectedInstall" that has nothing but the pole and the date. Or whatever
the key is you wind up hooking it to. Even if you have only one row in that
table for each pole, that keeps the database clean, as well as making it
obvious everywhere you actually use that data later (in that everywhere you
actually use it, you'll be joining against the table).
>> Ah, welcome to the club. ;-) That's why I like being boss.
>>
>
> can be difficult if you someone in the group has a strong personality.
As long as they're not stubborn in the face of reason. :-)
> Thanks for the tips - for the image files I don't know if it will go to
> blobs or stay files simply because of making working with the files more
> complex. But leaving the files as is adds complexity as well. <sigh>
Yeah, the main reason would be to unify managing the pictures with managing
the rest of the data. If unifying the pictures makes it more complex to
manage rather than simpler, it doesn't make sense to do that. Make sure you
make that bit modular in your code, and you shouldn't have a problem. :-)
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 09/03/2011 05:11 PM, Darren New wrote:
> I think they turned off the encryption of the low bits back in Clinton's
> timeframe, when other countries started threatening to launch their own
> satellites. I think civilian GPS is quite accurate now.
My satnav device cannot accurately determine which motorway I'm on.
Given that a six-lane motorway isn't exactly small, this implies a very
large inaccuracy in the GPS location.
(Of course, I suppose they may have turned off the encryption, but that
doesn't mean that a specific device knows about this fact...)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> On 09/03/2011 05:11 PM, Darren New wrote:
>
>> I think they turned off the encryption of the low bits back in Clinton's
>> timeframe, when other countries started threatening to launch their own
>> satellites. I think civilian GPS is quite accurate now.
>
> My satnav device cannot accurately determine which motorway I'm on.
Yeah, and your calculator can't run DOOM. We already know you make bad
technology purchase choices. ;-)
> Given that a six-lane motorway isn't exactly small, this implies a very
> large inaccuracy in the GPS location.
No, it implies a crappy GPS receiver.
Plus, GPS while you're moving is somewhat harder to get accurate than GPS
while you're stationary. The way you get millimeter precision even when the
encryption is turned on is to leave the receiver in the same place for a
couple of days, integrating the position samples.
--
Darren New, San Diego CA, USA (PST)
"How did he die?" "He got shot in the hand."
"That was fatal?"
"He was holding a live grenade at the time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |