bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques, whilst being around twice as fast at compression and six times faster at decompression.

Why would I want to use it?

  • Because it compresses well. So it packs more stuff into your overfull disk drives, distribution CDs, floppy disks, Zip disks, backup tapes, … whatever. And/or it reduces your phone bills, customer download times, long distance network traffic, … whatever. Pretty obvious really. Who’s arguing? It’s not the world’s fastest compressor, but it’s still fast enough to be plenty useful.
  • Because it’s free (GNU GPL’d), and, as far as I know, patent-free. (To the best of my knowledge. I can’t afford to do a full patent search, so I can’t guarantee this. Caveat emptor). So you can use it for whatever you like. Naturally, the source code is part of the distribution.
  • Because it supports (limited) recovery from media errors. If you are trying to restore compressed data from a backup tape or disk, and that data contains some errors, bzip2 may still be able to decompress those parts of the file which are undamaged.
  • Because you already know how to use it. bzip2’s command line flags are similar to those of GNU Gzip, so if you know how to use gzip, you know how to use bzip2.
  • Because it’s very portable. It should run on any 32 or 64-bit machine with an ANSI C compiler. The distribution contains source code for Unix systems, and precompiled binaries for Win95/NT. bzip2’s predecessor was reportedly ported to a large number of wierd and wonderful machines with little effort.
  • Because the documentation tells you how and to what extent I’ve tested it, and you can decide for yourself whether or not to entrust your data to it. Currently (29 Aug) the test volume is about 3,400 megabytes in 27000 files. I’ll update this figure as I run more stuff through it.
  • Because you liked bzip-0.21, and have been waiting anxiously for the Next Version. Well, here it is.

How do I get hold of it?

Click here. This gives you the C source code and docs, but no executables. If you can be bothered, please email me to say you’ve got a copy. That way I’ll be able to put you on a mailing list, if you want, so I can notify you of enhancements (if you want). Plus it allows me to get some feel for how many people have downloaded a copy, which is something I was always curious about with 0.21, but couldn’t find out.

Alternatively, if you just want a binary, here are some:

Thanks to Nelson Beebe (beebe@math.utah.edu) for many of these.

Rename the binary you get to plain “bzip2”, and use it.  If you want to decompress a .bz2 file you’ve got, just do “bzip2 -d my_file.bz2”.

Here’s the man page, so you can see properly how to use it.

If you’re as paranoid as I am, and want to use bzip2 to compress Extremely Important Data, you might want to build it from the source code.  It’s really very easy.  That way you get a self-test of the program, which might catch unforseen nasties on obscure machine/OS combinations.
 

Contributed stuff

patch for GNU tar 1.12 so you can make it compress with bzip2.  The relevant flags are -I or –bzip2.  From Hiroshi Takekawa (g640538@komaba.ecc.u-tokyo.ac.jp).  Several other people also sent patches for 1.12; thank you for them.

How does it relate to your previous offering (bzip-0.21) ?

bzip2 is a rewritten and re-engineered version of 0.21. It looks superficially fairly similar, but has been almost entirely re-written (several times :-). The important differences are:

  • Patent-free!  (I hope; see statement above). bzip-0.21 used arithmetic coding; bzip2 uses Huffman coding, which is generally regarded as non-problematic from a patent standpoint. Both programs are based on the Burrows-Wheeler transform, but, to the best of my knowledge, that’s not patented either.
  • Faster, particularly at decompression. bzip2 decompresses more than 50% faster than 0.21, mostly because of the use of Huffman coding. I’ve also improved the compression speed, although not that much — perhaps it compresses 30% faster than 0.21.
  • Recovery from media errors. Both programs compress data in blocks, by default, 900k long. With bzip2, each block is handled completely independently, carries its own checksum, and is delimited by a 48-bit sequence. So, if you have a damaged compressed file, bzip2 can extract the compressed blocks, detect which ones are undamaged, and decompress those.
  • Test mode. You can test integrity of compressed files without having to decompress them. I should have put this in 0.21, really, but was too lazy (+ burnt-out with hacking by the time I released it).
  • Handles very repetitive files much better. Such files are a worst-case for any block-sorting compressor. bzip2 runs approximately ten times faster than 0.21 for such files.
  • Support for smaller machines. bzip2 can decompress any file it creates in 2300k, which means you can decompress files on 4-meg machines. Peak memory use during compression is also reduced by about 900k compared with 0.21, to around 6400k.
  • Better flag handling. In particular, long flags (–like –this) are supported, which makes it easier to use.
  • The one-line startup message which 0.21 printed, is gone. This was 0.21’s most complained-about feature. It even bugs *me* nowadays.

I’m no longer distributing 0.21, because doing so perpetuates problems with patents, which ensures that the program will never be widely used. That’s a shame, because it’s a useful program, and lots of people seem to like it. If you use 0.21 already, please upgrade to bzip2. I can’t, unfortunately, make bzip2 be able to decompress 0.21’s .bz files, since that would render the patent-avoidance exercise pointless. I know changing file formats is painful; from now on, I’ll try and make any further changes in a backwards compatible way.

How can I decompress old .bz files (created by bzip-0.21) ?

Here’s a the source code for a decompress-only version of bzip-0.21.  Or you can download a binary for Linux-ELF.

What’s your day job?

I work at Canon Research Europe, at Guildford, Surrey, UK. Before that I worked for five years on parallelising compilers for functional languages at the University of Manchester, about 300km north of here.   I’m a big fan of Haskell, an elegant and useful functional language.  Getting a bit bored with C?  Try doing some lazy functional programming in Haskell.  It’ll change the way you think about programming.  Permanently.

I’m a member of the ACM, which I think is a fine organisation. You reach me by email through ACM, or via a more direct route.

Last updated Weds 5 March 1998.

Click here to view the latest version of this article

Leave a Reply