|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
The basic pre-compression transform in bzip2:
http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
--
Darren New, San Diego CA, USA (PST)
Why is there a chainsaw in DOOM?
There aren't any trees on Mars.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> The basic pre-compression transform in bzip2:
>
> http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
>
>
I'm too lazy and tired to read it through, but even the slight look at
it tells me why bzip2 ain't "streaming" compression (ie. you'll need the
whole file to uncompress it, unlike with gzip).
-Aero
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Eero Ahonen <aer### [at] removethiszbxtnetinvalid> wrote:
> I'm too lazy and tired to read it through, but even the slight look at
> it tells me why bzip2 ain't "streaming" compression (ie. you'll need the
> whole file to uncompress it, unlike with gzip).
Incorrect. It compresses one block at a time, independently of the others.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New <dne### [at] sanrrcom> wrote:
> The basic pre-compression transform in bzip2:
> http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
Not freaky. Ingenious. And fun to implement. (Done that.)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> I'm too lazy and tired to read it through, but even the slight look at
>> it tells me why bzip2 ain't "streaming" compression (ie. you'll need the
>> whole file to uncompress it, unlike with gzip).
>
> Incorrect. It compresses one block at a time, independently of the others.
Indeed, *most* compression algorithms do this today.
So no, you have to receive at least X bytes of input before you can emit
any output, but on the other hand you don't need to hold several GB of
data in memory all at once just to compress it.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Eero Ahonen <aer### [at] removethiszbxtnetinvalid> wrote:
>> I'm too lazy and tired to read it through, but even the slight look at
>> it tells me why bzip2 ain't "streaming" compression (ie. you'll need the
>> whole file to uncompress it, unlike with gzip).
>
> Incorrect. It compresses one block at a time, independently of the others.
>
While reading more about bzip2, you're correct. It makes the block
required to be full to uncompress, but not the whole file. Also while
reading more about bzip2, it's freaking compicated format - at least at
first sight. But it makes sense after all - and works (which is what
counts).
-Aero
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
>
> So no, you have to receive at least X bytes of input before you can emit
> any output, but on the other hand you don't need to hold several GB of
> data in memory all at once just to compress it.
>
I'm still pretty sure you'll need the whole file in case of bzip2. But
while looking at the system(1), it's a multi-layer format and needs
multiple steps to decompress.
1) http://en.wikipedia.org/wiki/Bzip2
-Aero
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New <dne### [at] sanrrcom> wrote:
> The basic pre-compression transform in bzip2:
>
> http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
Neat! :D
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Eero Ahonen wrote:
>
> I'm still pretty sure you'll need the whole file in case of bzip2. But
Now I'm even more sure, I tested ;). I drained 10M of random data to a
file, compressed it, had 11M file :), took first 10M of it and tried to
uncompress it.
aero@groath ~ $ dd if=/dev/urandom of=temp/testi bs=1M count=10
cd temp
ls10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 2.41001 s, 4.4 MB/s
aero@groath ~ $ cd temp
aero@groath ~/temp $ ls -lah|grep testi
-rw-r--r-- 1 aero aero 10M Jan 5 21:16 testi
aero@groath ~/temp $ bzip2 testi
aero@groath ~/temp $ ls -lah|grep testi
-rw-r--r-- 1 aero aero 11M Jan 5 21:16 testi.bz2
aero@groath ~/temp $ dd if=testi.bz2 of=testi2.bz2 bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.0284283 s, 369 MB/s
aero@groath ~/temp $ bunzip2 testi2.bz2
bunzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? *Possible* reason follows.
bunzip2: No such file or directory
Input file = testi2.bz2, output file = testi2
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
bunzip2: Deleting output file testi2, if it exists.
-Aero
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Not freaky. Ingenious.
Those are not mutually exclusive. :-)
I also liked how the inventors managed the algorithm for arithmetic coding
to not need unbounded amounts of memory to hold the intermediate arithmetic
value being created (i.e., how they handled a string of .9999999 sort of
thing...)
--
Darren New, San Diego CA, USA (PST)
Why is there a chainsaw in DOOM?
There aren't any trees on Mars.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |