POV-Ray: Newsgroups: povray.off-topic: Tape-sort

POV-Ray : Newsgroups : povray.off-topic : Tape-sort		Server Time 3 Sep 2024 15:11:29 EDT (-0400)

From: Invisible
Subject: Tape-sort
Date: 12 Oct 2010 11:03:49
Message: <4cb478d5$1@news.povray.org>

I've been thinking about how you'd go about efficiently sorting data 
stored on magnetic tape.

The key thing about tape is that it's sequential. We'll also assume that 
it holds more data than you can actually fit into main memory, as was 
the case in the days of old.

You can of course wind tape forward and backwards, and presumably with a 
sufficiently expensive tape transport system you can do this quite 
accurately. You can't do it particularly fast, however. Fundamental 
limits such as friction prevent that. (If you wind the tape fast enough, 
it'll overheat and melt.)

Presumably the tape transport system isn't accurate enough to do an 
in-place sort on tape. (Even if it is, the seek times are probably going 
to be prohibitively high if we want to read a record, rewind one record, 
re-record that record, read the next record, rewind one record, 
re-record this record, etc.) This limits us to tape-to-tape operations.

So, we have this set of constraints:
- We have so much data that only tape is large enough to hold it.
- Tape provides efficient sequential access, but has very long seek times.
- Moving data around on a single tape is infeasible.
- Tape is presumably much slower than main memory for read/write access 
(in addition to the huge seek latency).
- Only a finite number of records will fit into main memory at once.
- We want to sort our data in the smallest amount of wall-clock time.

So what are our options here? Well, since in-place sorting is basically 
out of the question, we're going to need at least two tapes, an input 
and an output. We *could* do something like read one tape, bubble-sort 
it, and write the results onto the other tape. Then rewind both tapes, 
swap input and output, and start again. Repeat until the data is sorted.

Trouble is, this requires both tapes to be rewound approximately O(n) 
times. If we're sorting a few million records, we're going to have to 
read the tape end-to-end and rewind to the beginning a few million 
times. Since presumably even just rewinding a big reel of tape is quite 
a timely operation (never mind winding it past the pick-up head slowly 
enough to read it), this is going to take a rather drastic amount of time.

At the opposite end of the spectrum, we could read as much data as 
possible into main memory, sort that chunk completely (which is just a 
normal in-memory sort, using any algorithm we like), and write the 
sorted block out to tape #1. We then read in the next chunk, sort, and 
write to tape #2. We continue in this fashion until there is no more 
data to read. Finally, we rewind all the tapes and do a multi-way merge 
operation.

Trouble is, if we have 1 million records to sort and only 1 thousand 
records fit in memory, we're going to need 1 thousand tapes and tape 
drives. (!) This, presumably, is prohibitively expensive.

So far, we have a 2-tape bubble-sort, which is absurdly slow, and an 
unlimited-tape merge-sort, which requires a ridiculous number of tape 
drives.

Let's back up for a moment here. What fundamental operations can we do?

- We can read a chunk of unsorted data into memory and completely sort 
it. Since main memory is fast and has random access, we can use any 
algorithm we like for this. And presumably this part of the task is so 
much faster than tape access that we can regard it as almost 
instantaneous by comparison.

- We can read data from one or more input tapes and write it out to one 
or more output tapes.

Looking at the second item, this basically amounts to allowing splits 
and merges:

- We can take unsorted input data and split it into several output 
ranges, as per one iteration of a quick-sort.

- We can take several tapes of sorted data and merge it together onto a 
single output tape, as per a single iteration of merge-sort.

Trouble is, both quick-sort and merge-sort are recursive algorithms, and 
to implement them on tape would require an unbounded number of tapes, 
which isn't really practical. Even if you can somehow arrange the data 
onto tapes so that you can fit lots of different tapes through only a 
limited number of tape drives, you're still going to spend more time 
rewinding tapes than actually doing useful processing.

This is where my musings on the min-heap come in. I was thinking, you 
could take an input tape, read data off of it into a min-heap, keep 
reading until main memory is full, and then remove the top item (the 
minimum element) and put it onto the output tape. Something like this:

   REPEAT
     READ NEXT RECORD FROM INPUT TAPE
     INSERT INTO MIN HEAP
     IF HEAP IS FULL
       REMOVE MINIMUM RECORD
       WRITE ONTO OUTPUT TAPE
   UNTIL END-OF-INPUT-TAPE
   REPEAT
     REMOVE MINIMUM RECORD
     WRITE ONTO OUTPUT TAPE
   UNTIL HEAP-IS-EMPTY

If you think about it, it's sort of like a bubble-sort, but instead of 
comparing just two items, you compare as many as you can hold in memory 
at once. Like a bubble-sort, it has the property that the "stones" fall 
to the bottom very quickly, but the "bubbles" rise to the top quite 
slowly. (If the heap holds N items, each bubble can only rise a maximum 
of N slots further up the tape on each pass.)

So it's *like* a bubble-sort, but better. N times better, actually 
(where N is the maximum size of the heap). Indeed, for a 2-element 
min-heap, it degenerates into a real bubble-sort, of the kind we all 
know and hate.

Even so, a bubble-sort over N elements requires up to N passes, whereas 
this new hybrid sort requires up to N-H passes, where H is the maximum 
heap size. For a million items and a thousand-element heap, that's not 
much of a reduction, actually. (It's 0.1% faster.)

What you can do is split the input data into many smaller datasets, and 
thus reduce the work required to sort them. You could read the input 
tape and partition it onto, say, 10 output tapes, and then repeatedly 
heap-bubble the 10 tapes in parallel until they're all sorted, and 
finally scan them all to merge them back together. That would result in 
roughly a 100-fold speed-up in the sorting process.

That got me thinking about how you determine the optimum pivot values. 
Ideally you'd want to look at the histogram of the data and move the 
pivot values until there's roughly the same amount of data between each 
pair. So first you'd need to scan the input tape to find the min and max 
values, and then maybe split the range between those into, say, 1000 
buckets, rewind the input tape and count how many items fall into each 
range.

Trouble is, the data might be clumped very unevenly. So maybe what we 
need to do is

1. Scan for the min and max values.
2. Linearly divide up the space and scan for a 1000-point histogram.
3. Partition the resulting histogram into 10 equal-looking ranges, split 
each new super-range into 100 sub-ranges, and scan the tape a third time 
for another 1000-point histogram (but now with unequal bucket sizes to 
hopefully better sample the data).

Maybe 2 histogram passes isn't enough? Maybe the process could be made 
iterative, such that it repeats until the bucket edges stop changing? 
Perhaps there should be special handling for particularly sparse 
buckets? Come to think of it, does each of the final computed buckets 
even need to be a contiguous range? The sorting algorithm doesn't 
require that - but maybe the storage requirements of keeping track of 
all the buckets makes it necessary...

Then again, all this assumes that we're going to carefully split the 
data and then just concat-merge it. You could of course do it the other 
way around - and *that* requires no such special processing. Just take 
the input data, copy it onto multiple tapes in any old fashion that you 
fancy, and do the careful bit when you come to merge the sorted results 
back together.

Ideally we want to do either a quick-sort or a merge-sort. But these 
both require recursive splitting or merging. We don't have enough tapes 
for that. We could put all the fragments onto the same tape, but now 
we're in danger of requiring a lot of seeking to find the data we want, 
and that'll be slow.

I did think to myself that you could begin by reading unsorted chunks, 
sorting them in-memory and writing them to tape. You now have a tape of 
N-item sorted chunks. You could then read N/2 records from the first 
chunk, seek forward N/2 records, and read N/2 records from the second 
chunk, and write the N merged results to the output. Repeat until you 
get to the end of the input tape, and then rewind it to the beginning 
and go about merging the second halves of all the chunks.

Trouble is, you're going to have to end up seeking in bigger and bigger 
steps with each subsequent pass. Eventually it's going to get just too slow.

Then I though to myself, why not have the chunks on two input tapes? 
That way, if the chunks are already sorted, you don't even need to hold 
the whole chunk in memory at once. You can just merge them as they read 
off the two input tapes, and write the results onto the single output tape.

Single output tape. Yeah, that means that after the first merge cycle, 
you now have all your data on one tape, and you've going to have to 
start seeking around again. Except that, wait, you could have two output 
tapes, and just alternate between then as you're writing. So all the odd 
chunks go on one tape, and all the even chunks go on the other. And then 
you can rewind everything and start again in the opposite direction, 
with bigger chunks. Repeat until you're done.

This approach requires you to have 4 tapes / tape drives. It also 
requires the tapes to be read / rewound / written / rewound 
approximately log N - log C times (where C is the initial chunk size - 
presumably the largest chunk size you can sort in-memory). That doesn't 
sound too bad. Better still, there's now no seeking at all. Just 
sequential reads and writes.

Still, when a pass finishes, all four tapes are going to have to rewind 
to the beginning before the next pass can start. The computer itself is 
going to be doing nothing while we wait for the tapes to rewind. Is 
there some way that we can add a modest number of additional tape drives 
and keep the computer busy?

You could have two sets of 4 tapes, but they're all going to tend to end 
up rewinding at the same time. That doesn't seem to buy much.

You could have 4 "big" tapes and 4 "small" tapes, where you basically 
put half as much data onto the small tapes. At any time you're reading 
from two tapes and writing to two other tapes. When you're reading from 
two big tapes, you write half the data to two small tapes, rewind them 
and write to the other two small tapes while the first two small tapes 
are rewinding. By the time you've finished reading the two big tapes, 
the first two small tapes should have finished rewinding, so while the 
other tapes are rewinding, the two small tapes act as input and the two 
hitherto unused big tapes act as output. By the time you get to the end 
of the small tapes, the other two small tapes should have finished 
rewinding.

By this arrangement, I think you can keep the machine running 
constantly. There's always some tape rewound and ready for reading, and 
some other tapes rewound and ready for writing.

It seems that by adding more tape drives, it ought to be possible to get 
the job done even faster. I need to go meditate on this...

Post a reply to this message

From: Warp
Subject: Re: Tape-sort
Date: 12 Oct 2010 12:06:59
Message: <4cb487a3@news.povray.org>

Invisible <voi### [at] devnull> wrote:
> Trouble is, both quick-sort and merge-sort are recursive algorithms, and 
> to implement them on tape would require an unbounded number of tapes, 
> which isn't really practical.

  I don't think you understand how merge sort works. Merge sort is exactly
what you are looking for in this case (ie. you want to sort a large amount
of data in a slow storage medium, and you have a secondary empty storage
medium of the same size to use for the sorting).

  Each pass of the merge sort will copy all the data from one tape to the
other, alternatively. The total amount of such copyings will be exactly
ceil(log2(n)) (where n is the amount of elements to sort). For example,
if the amount of elements to sort is 10 billions, merge sort will require
to copy all the data from one tape to the other exactly 34 times. Only
two elements need to be in RAM at a time.

  You can reduce the number of copyings by sorting smaller partitions in
RAM (in other words, instead of sorting partitions of 2 elements in the
first merge sort pass and writing the result to the second tape, you
sort partitions of as many elements as will fit in RAM before writing
the result to the second tape). For example, if you can sort 1 million
elements in RAM, you will save log2(10^6) = 19 copyings, so the total
number of times you need to copy all the data from one tape to the other
is reduced to 15.

  Merge sort requires a lot of seeking when comparing partitions. However,
you can use the RAM to reduce the needed amount of seeking by reading as
many elements of each partition being compared as possible to RAM, and
then doing the comparisons in RAM and writing the result to the other
tape. There will still be a fair amount of seeking (in the tape being
read), but this RAM buffering will reduce it quite significantly.

-- 
                                                          - Warp

Post a reply to this message

From: scott
Subject: Re: Tape-sort
Date: 13 Oct 2010 03:40:17
Message: <4cb56261@news.povray.org>

> I've been thinking about how you'd go about efficiently sorting data 
> stored on magnetic tape.
>
> The key thing about tape is that it's sequential. We'll also assume that 
> it holds more data than you can actually fit into main memory, as was the 
> case in the days of old.

If you haven't already, you could code a simple "tape drive simulator" to 
test all your ideas...

Depending on what type of data you want to sort and the seeking 
characteristics of a tape drive, a radix sort might work quite well.  Just 
an idea.

Post a reply to this message

From: Invisible
Subject: Re: Tape-sort
Date: 13 Oct 2010 04:08:40
Message: <4cb56908@news.povray.org>

On 13/10/2010 08:40 AM, scott wrote:

> If you haven't already, you could code a simple "tape drive simulator"
> to test all your ideas...

Yes, this is part of my immediate plan.

> Depending on what type of data you want to sort and the seeking
> characteristics of a tape drive, a radix sort might work quite well.
> Just an idea.

I don't actually know how that works. I'll have to go look it up.

Post a reply to this message

From: Invisible
Subject: Re: Tape-sort
Date: 13 Oct 2010 04:14:51
Message: <4cb56a7b$1@news.povray.org>

On 12/10/2010 05:06 PM, Warp wrote:

>    Merge sort requires a lot of seeking when comparing partitions. However,
> you can use the RAM to reduce the needed amount of seeking by reading as
> many elements of each partition being compared as possible to RAM, and
> then doing the comparisons in RAM and writing the result to the other
> tape. There will still be a fair amount of seeking (in the tape being
> read), but this RAM buffering will reduce it quite significantly.

And if you put half of the data onto one tape and half of the data onto 
another tape, you can avoid *all* of the tape seeks. Each 
partially-sorted "chunk" is the same as all the others; you don't need 
to keep them arranged with respect to each other. So on each pass, just 
write half of the chunks onto one output tape and half onto the other. 
Then rewind the tapes and merge from both input tapes (again onto two 
output tapes). In this manner, with 4 tapes, you need no tape seeks at 
all. (The number of passes is of course still log N.)

Post a reply to this message

From: scott
Subject: Re: Tape-sort
Date: 13 Oct 2010 05:08:12
Message: <4cb576fc$1@news.povray.org>

>> Depending on what type of data you want to sort and the seeking
>> characteristics of a tape drive, a radix sort might work quite well.
>> Just an idea.
>
> I don't actually know how that works. I'll have to go look it up.

I used it a long time ago for sorting triangles based on depth (before depth 
buffers).  For a tape drive system I would sort 1 bit at a time to minimise 
the number of passes, and using 4 tapes instead of 2 you could probably 
avoid reading the input twice for each bit.  This algorithm Might Work (it 
can probably be adapted for non-integer data somehow):

4 tapes:
input1 = input data
input2 = empty or some more input data
output1 = empty
output2 = empty

for b = 0 to <max bits in data>

 while not eof(input1)
  n = read next from input1
  if bit b of n is not set,
    write n to output1
  else
    write n to output2
 endwhile

 rewind input1
 // next loop can run whilst input1 is rewinding

 while not eof(input2)
  n = read next from input2
  if bit b of n is not set,
    write n to output1
  else
    write n to output2
 endwhile

 rewind input2
 rewind output1
 rewind output2

 wait until all tapes rewound

 swap input1,output1
 swap input2,output2

next

Post a reply to this message

From: Invisible
Subject: Re: Tape-sort
Date: 13 Oct 2010 05:55:13
Message: <4cb58201@news.povray.org>

>> Depending on what type of data you want to sort and the seeking
>> characteristics of a tape drive, a radix sort might work quite well.
>> Just an idea.
>
> I don't actually know how that works. I'll have to go look it up.

Having read the article on radix sort, it appears that it's similar to a 
quick-sort or bucket sort, except that you split the input into several 
output buckets based on the "digits" of the key, rather than on comparisons.

This results in the processing time being proportional to the number of 
digits in the key and independent of the actual number of keys to process.

One immediate and obvious problem is that the buckets might fill very 
unequally, depending on the distribution of the data to be sorted. (This 
is essentially the same problem that quick-sort has.)

In contrast, a merge-sort takes exactly the same amount of time 
regardless of what the data *is*. The sorting time depends only on *how 
much* data there is.

I'm still leaning towards merge-sort (especially since the data I'm 
thinking about sorting is floating-point numbers). But I guess what I 
need to do is run some simulations...

Post a reply to this message

From: scott
Subject: Re: Tape-sort
Date: 13 Oct 2010 07:49:03
Message: <4cb59caf@news.povray.org>

> One immediate and obvious problem is that the buckets might fill very 
> unequally, depending on the distribution of the data to be sorted.

Is this really a problem?  It shouldn't affect the run-time of the radix 
sort algorithm at all.

Post a reply to this message

From: Invisible
Subject: Re: Tape-sort
Date: 13 Oct 2010 08:05:24
Message: <4cb5a084@news.povray.org>

On 13/10/2010 12:49 PM, scott wrote:
>> One immediate and obvious problem is that the buckets might fill very
>> unequally, depending on the distribution of the data to be sorted.
>
> Is this really a problem? It shouldn't affect the run-time of the radix
> sort algorithm at all.

Hmm, yes. The running time of a radix sort is proportional to the length 
of the keys, and regardless of the volume of data. Which means that even 
sorting a small volume of data could be quite slow if the keys are large.

It also doesn't help that the keys I want to sort by are floating-point 
numbers... although I suppose you could truncate them to integers and do 
it that way.

Post a reply to this message

From: scott
Subject: Re: Tape-sort
Date: 13 Oct 2010 08:49:55
Message: <4cb5aaf3@news.povray.org>

> Hmm, yes. The running time of a radix sort is proportional to the length 
> of the keys, and regardless of the volume of data. Which means that even 
> sorting a small volume of data could be quite slow if the keys are large.

Indeed, I think it's more suited to large volumes of data with relatively 
small keys.  I just thought the lack of random access meant it might be 
quite well suited to tape drives.

> It also doesn't help that the keys I want to sort by are floating-point 
> numbers... although I suppose you could truncate them to integers and do 
> it that way.

Or do a radix sort on the mantissa part (which I guess is maximum 8 digits 
or so), then on the exponent (3 digits?).  That should be more efficient 
than simply converting the float to an integer, especially if you have a 
very wide range of values.

Post a reply to this message