Comparison of compressed archive file sizes

2007-05-12 10:50:11 UTC

While writing an upcoming blog post, I decided to get some hard numbers. So here is a comparison of archive formats and compressors.

The input is Adium 1.0.3. Specifically, I copied the volume of the official disk image to my RAM disk; this created a folder. I deleted the custom “disk image” icon from the folder (using Get Info), but otherwise left it intact.

I created a script that starts each compression process, followed by a wc -c command to print its size in bytes. Prior to running the script, I created one zip file using the Finder; all other files were generated from the script.

These are the results:

Bytes Filename Comment
8108955 Adium_1.0.3.7z
8479898 Adium_1.0.3-Binary.sitx StuffIt 11, “Best Binary Compression” method, compression level 5 (of 5)
11464784 Adium_1.0.3-StuffIt.tbz StuffIt 11, tar+bzip, compression level 9
11472453 Adium_1.0.3-pbzip2.tbz tar and pbzip2
11507671 Adium_1.0.3-cjf.tbz tar cjf
13209615 Adium_1.0.3-StuffIt.tgz StuffIt 11, tar+gzip, compression level 15 (of 15)
13363283 Adium_1.0.3-UDBZ.dmg hdiutil create -format UDBZ
13690168 Adium_1.0.3.tgz tar czf
14723809 Adium_1.0.3.sitx StuffIt 11, “Choose Method by Analysis”
14757937 Adium_1.0.3-UDZO9.dmg hdiutil create -imagekey zlib-level=9
15554735 StuffIt 11, Deflate method, compression level 15 (of 15)
16010091 Finder’s “Create Archive of” command
30497119 zip -r9

7-zip and StuffIt X are included for academic purposes only, since the average user doesn’t have a decompressor for them. (The decompressors are The Unarchiver for 7-zip and StuffIt Expander for StuffIt X.)

Those two formats, the oddballs, are the clear winners—both break the 1-MiB barrier. These are followed by bzip2, gzip, and finally zip.

Specific engines do make a difference. StuffIt‘s engine won across the board. pbzip2 beat tar cjf, which is strange, since pbzip2 uses libbzip2. Even StuffIt’s tgz beat UDBZ, and the two command-line tbzs beat it by a couple of MiB (probably the size difference in the uncompressed meats). For gzip compression, StuffIt beats GNU zip. (The StuffIt X format, with the “Choose Method by Analysis” method, appears again in between gzip and UDZO; apparently, StuffIt’s analyzer needs some work.) Finally, zip: It seems obvious that the implementation of zip(1) is amazingly bad, considering how badly it got beaten by both StuffIt and Finder.

StuffIt made an impressive showing in this test, and interface-wise, I’m happy to say that it’s a lot better than it was back in versions 6–8. I may start using it again for future compression work. (No, I’m not being paid by Smith Micro.)

Of course, your mileage may vary.

7 Responses to “Comparison of compressed archive file sizes”

  1. Simone Manganelli Says:

    Of course, the obvious problem with distributing StuffIt archives is that StuffIt Expander isn’t pre-installed by default on new Macs and isn’t installed with the operating system anymore. It used to be. So you might as well just take advantage of the fact that 7-zip produces smaller archives than StuffIt.

    Also, disk images still have the enormous benefit in that you can browse through their contents without uncompressing everything beforehand. For StuffIt archives, you need StuffIt Deluxe to do this, and you also can’t use the Finder for this purpose (as far as I know). This isn’t too much of a big deal with small archives, but it’s much nicer to use disk images because of this when you have archives that are greater than about 50 MB.

  2. Mark Grimes Says:

    It’s almost too bad StuffIt does so well in this use case… people might actually walk away from the post suddenly embracing it.

  3. Peter Hosey Says:

    Also, disk images still have the enormous benefit in that you can browse through their contents without uncompressing everything beforehand.

    Simone: The post I’m working on addresses that. ☺

    (Evan, if he’s reading this, has probably guessed what I’m hinting at.)

  4. daniel Says:

    Don’t forget that 7-zip can also do a mean compression of plain zip and bzip2, much like StuffIt, only even better and for free.
    Repeating your test of compressing Adium 1.0.3 with p7zip 4.45 on a ppc Mac:
    – creating a .tar.bz2 with -tbzip2 -mx=9: 11,267,731 bytes
    – creating a .zip with -tzip -mx=9: 15,166,337 bytes

    Except the .zip version is a bit useless as 7zip does not support storing Unix permissions, or Mac resource forks and the like (that can be worked around manually). Anyway, not a problem is you tar it first, like with .tar.bz2.

    Of course this is a moot point for Mac software distribution — .dmg all the way! Totally dumb and ignorant users will always find a way to nag you with obvious support calls, there’s no solution for that. The best you could do is to do things the same way everybody else does, which is using disk images. Once a computer-illiterate user finally understands the concept, it’ll be familiar to them from then on for any piece of software. Much like understanding a windowing system or how to click with a mouse.

  5. Devin Coughlin Says:

    Fritz Anderson pointed out on the darwin-dev list today that you should use the -y flag when using command line zip.

    Without -y, zip will follow symbolic links and include the files from them. The problem with this is that embedded frameworks have lots of symbolic links in them to help with versioning. So if you don’t use -y, you will end up with three copies of everything in your embedded framework. Using -y brings command-line zip in line with Finder’s “Make archive of” command.

    Adium is particularly affected by this, of course, because it uses a lot of embedded frameworks.

  6. Peter Hosey Says:

    Adium is particularly affected by this, of course, because it uses a lot of embedded frameworks.

    Or it would be, if we used zip archives. ;)

  7. Peter Hosey Says:

    Ah, I see. You meant in the comparison itself. I hadn’t seen your blog post yet. ☺

    I’ll be updating the comparison shortly with this new info. Thanks for passing it on!

Leave a Reply

Do not delete the second sentence.