|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Weee... check this out:
A full scan of a table containing ten million items requires up to ten
million operations. (On average, half that.) Using binary chop, it takes
only the binary logarithm of that - about 25 operations.
If a computer can process 1,000 operations per second (bearing in mind,
comparing a record to the target being looked up is a far few machine
instructions), we achieve the following times:
Full scan: 2.777 hours
Binary chop: 0.025 seconds
So one way takes almost three hours to do a single lookup, while the
other can do 40 lookups per second. O(log N) FTW!
But wait, check this out. Perhaps you are somehow unimpressed with
taking a 3-hour operation and completing it in 1/40th of a second. Well
try this for size:
Assume there are 10^80 atoms in the observable universe. If you
sequentially full scan then all, one by one, at 1,000 atoms per second,
it'll take you 10^77 seconds.
This is longer than the universe has existed (10^11 seconds).
Hypothetically, it's longer than it will take for all the energy in the
universe to be exhausted. By the time 10^77 seconds is up, the universe
will have died a cold, dark, silent death of total equilibrium a very
long time ago.
In short, this procedure is so slow that there isn't enough time or
energy in the whole of the known universe to actually complete it. (But
then, who's going to pin a unique record to every atom in the universe
anyway?)
On the other hand, according to my sums 10^80 is roughly 2^266. Thus,
the binary chop algorithm requires only 266 operations, and can easily
complete the entire process in well under a second. [If we assume
completely random access to any atom in the universe, which completely
defies several known physical laws...]
Like, wow man!
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> On the other hand, according to my sums 10^80 is roughly 2^266. Thus,
> the binary chop algorithm requires only 266 operations, and can easily
> complete the entire process in well under a second. [If we assume
> completely random access to any atom in the universe, which completely
> defies several known physical laws...]
It would require for the atoms to be sorted in a specific order.
How would you sort them?
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Invisible <voi### [at] devnull> wrote:
>> On the other hand, according to my sums 10^80 is roughly 2^266. Thus,
>> the binary chop algorithm requires only 266 operations, and can easily
>> complete the entire process in well under a second. [If we assume
>> completely random access to any atom in the universe, which completely
>> defies several known physical laws...]
>
> It would require for the atoms to be sorted in a specific order.
> How would you sort them?
Clearly by the value of the time component of the space-time event where
the atom currently exists. ;-)
--
Darren New / San Diego, CA, USA (PST)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Invisible" <voi### [at] devnull> wrote in message
news:48908346@news.povray.org...
> So one way takes almost three hours to do a single lookup, while the
> other can do 40 lookups per second. O(log N) FTW!
Which is why we really, really like indexes on database tables.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> So one way takes almost three hours to do a single lookup, while the
>> other can do 40 lookups per second. O(log N) FTW!
>
> Which is why we really, really like indexes on database tables.
To quote Team America, "**** YEAH!"
Although - and not many people seem to understand this part - indexes
make looking up *one* element much faster (if you index the right
thing). However, if you're doing to end up needing to process a large
percentage of the table anyway, it's actually faster to *not use* any
indexes there might be.
This is why we have complex database engines with sophisticated
algorithms which attempt to pick the best method for performing each
query. Yet still it seems that certain people think that the task of
"tuning" an SQL query means working out which combination of optimiser
hints cause the database to always use indexes for that query. *sigh*
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> This is longer than the universe has existed (10^11 seconds).
Some of the guys mentioned here might have disagreed with you on that one:
http://en.wikipedia.org/wiki/12th_century_BC
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Invisible" <voi### [at] devnull> wrote in message
news:48917361$1@news.povray.org...
> Although - and not many people seem to understand this part - indexes
> make looking up *one* element much faster (if you index the right
> thing). However, if you're doing to end up needing to process a large
> percentage of the table anyway, it's actually faster to *not use* any
> indexes there might be.
Really? I'll go and discard everything I know about using indexes right
away....
If you're processing a portion of a table and you have an index that covers
the required columns and finds you the portion of the table you want, it's
much, much better to use the index than to scan the table. Think about it
this way. If you need all the entries in the phone book where the first
letter of the surname in between N and T, would you prefer the phone book
ordered by surname or would you prefer an unorganised heap of phone book
entries?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Although - and not many people seem to understand this part - indexes
>> make looking up *one* element much faster (if you index the right
>> thing). However, if you're doing to end up needing to process a large
>> percentage of the table anyway, it's actually faster to *not use* any
>> indexes there might be.
>
> Really? I'll go and discard everything I know about using indexes right
> away....
>
> If you're processing a portion of a table and you have an index that covers
> the required columns and finds you the portion of the table you want, it's
> much, much better to use the index than to scan the table. Think about it
> this way. If you need all the entries in the phone book where the first
> letter of the surname in between N and T, would you prefer the phone book
> ordered by surname or would you prefer an unorganised heap of phone book
> entries?
My point is that if you need to process 75% of the rows of the table,
it's probably faster to do a sequential full scan than to use an index
and do lots of random I/O. Obviously if you only need to process, say,
5% of the table, you'd be mad to not use an index...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Gail Shaw" <initialsurname@sentech sa dot com> wrote in message
news:48919ec6@news.povray.org...
>
> "Invisible" <voi### [at] devnull> wrote in message
> news:48917361$1@news.povray.org...
>
> > Although - and not many people seem to understand this part - indexes
> > make looking up *one* element much faster (if you index the right
> > thing). However, if you're doing to end up needing to process a large
> > percentage of the table anyway, it's actually faster to *not use* any
> > indexes there might be.
>
> Really? I'll go and discard everything I know about using indexes right
> away....
Sorry, that cam out a little snarkier than I intended.
Taking SQL Server as an example (so I can give specific numbers), if an
index does not contain all the columns required, the break point for use the
index or scan the table is around 1% of the table. If the index does contain
all the columns, then the index will be used whether 1 row is required or
the entire table.
It's not hard to demonstrate.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Invisible" <voi### [at] devnull> wrote in message
news:4891a10d$1@news.povray.org...
> My point is that if you need to process 75% of the rows of the table,
> it's probably faster to do a sequential full scan than to use an index
> and do lots of random I/O. Obviously if you only need to process, say,
> 5% of the table, you'd be mad to not use an index...
I'm talking about range queries (all values between 1 and 100, all surnames
with first letters between N and Z). In that case you;'re not going to be
doing random IOs. One seek to find the start of the range (3 -5 random IOs),
then scan along the index leaf pages (sequential IOs). Provided all the data
required is included in the leaf pages, that is going to be faster than a
scan of the entire table.
Firstly because in general indexes are smaller than the entire table so
fewer IOs are required, second because the inapplicable 25% can be ignored
and doesn't have to be read and examined.
Of course, if you're looking for a random 75% of the table, then it's a
table scan.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |