Comparison of compressed archive file sizes
While writing an upcoming blog post, I decided to get some hard numbers. So here is a comparison of archive formats and compressors.
The input is Adium 1.0.3. Specifically, I copied the volume of the official disk image to my RAM disk; this created a folder. I deleted the custom “disk image” icon from the folder (using Get Info), but otherwise left it intact.
I created a script that starts each compression process, followed by a wc -c command to print its size in bytes. Prior to running the script, I created one zip file using the Finder; all other files were generated from the script.
These are the results:
Bytes | Filename | Comment |
---|---|---|
8108955 | Adium_1.0.3.7z | |
8479898 | Adium_1.0.3-Binary.sitx | StuffIt 11, “Best Binary Compression” method, compression level 5 (of 5) |
11464784 | Adium_1.0.3-StuffIt.tbz | StuffIt 11, tar+bzip, compression level 9 |
11472453 | Adium_1.0.3-pbzip2.tbz | tar and pbzip2 |
11507671 | Adium_1.0.3-cjf.tbz | tar cjf |
13209615 | Adium_1.0.3-StuffIt.tgz | StuffIt 11, tar+gzip, compression level 15 (of 15) |
13363283 | Adium_1.0.3-UDBZ.dmg | hdiutil create -format UDBZ |
13690168 | Adium_1.0.3.tgz | tar czf |
14723809 | Adium_1.0.3.sitx | StuffIt 11, “Choose Method by Analysis” |
14757937 | Adium_1.0.3-UDZO9.dmg | hdiutil create -imagekey zlib-level=9 |
15554735 | Adium_1.0.3-StuffIt.zip | StuffIt 11, Deflate method, compression level 15 (of 15) |
16010091 | Adium_1.0.3.zip | Finder’s “Create Archive of” command |
30497119 | Adium_1.0.3-zip.zip | zip -r9 |
7-zip and StuffIt X are included for academic purposes only, since the average user doesn’t have a decompressor for them. (The decompressors are The Unarchiver for 7-zip and StuffIt Expander for StuffIt X.)
Those two formats, the oddballs, are the clear winners—both break the 1-MiB barrier. These are followed by bzip2, gzip, and finally zip.
Specific engines do make a difference. StuffIt‘s engine won across the board. pbzip2 beat tar cjf, which is strange, since pbzip2 uses libbzip2. Even StuffIt’s tgz beat UDBZ, and the two command-line tbzs beat it by a couple of MiB (probably the size difference in the uncompressed meats). For gzip compression, StuffIt beats GNU zip. (The StuffIt X format, with the “Choose Method by Analysis” method, appears again in between gzip and UDZO; apparently, StuffIt’s analyzer needs some work.) Finally, zip: It seems obvious that the implementation of zip(1) is amazingly bad, considering how badly it got beaten by both StuffIt and Finder.
StuffIt made an impressive showing in this test, and interface-wise, I’m happy to say that it’s a lot better than it was back in versions 6–8. I may start using it again for future compression work. (No, I’m not being paid by Smith Micro.)
Of course, your mileage may vary.
May 12th, 2007 at 12:34:57
Of course, the obvious problem with distributing StuffIt archives is that StuffIt Expander isn’t pre-installed by default on new Macs and isn’t installed with the operating system anymore. It used to be. So you might as well just take advantage of the fact that 7-zip produces smaller archives than StuffIt.
Also, disk images still have the enormous benefit in that you can browse through their contents without uncompressing everything beforehand. For StuffIt archives, you need StuffIt Deluxe to do this, and you also can’t use the Finder for this purpose (as far as I know). This isn’t too much of a big deal with small archives, but it’s much nicer to use disk images because of this when you have archives that are greater than about 50 MB.
May 12th, 2007 at 16:59:33
It’s almost too bad StuffIt does so well in this use case… people might actually walk away from the post suddenly embracing it.
May 12th, 2007 at 19:06:53
Simone: The post I’m working on addresses that. ☺
(Evan, if he’s reading this, has probably guessed what I’m hinting at.)
May 15th, 2007 at 15:01:49
Don’t forget that 7-zip can also do a mean compression of plain zip and bzip2, much like StuffIt, only even better and for free.
Repeating your test of compressing Adium 1.0.3 with p7zip 4.45 on a ppc Mac:
– creating a .tar.bz2 with -tbzip2 -mx=9: 11,267,731 bytes
– creating a .zip with -tzip -mx=9: 15,166,337 bytes
Except the .zip version is a bit useless as 7zip does not support storing Unix permissions, or Mac resource forks and the like (that can be worked around manually). Anyway, not a problem is you tar it first, like with .tar.bz2.
Of course this is a moot point for Mac software distribution — .dmg all the way! Totally dumb and ignorant users will always find a way to nag you with obvious support calls, there’s no solution for that. The best you could do is to do things the same way everybody else does, which is using disk images. Once a computer-illiterate user finally understands the concept, it’ll be familiar to them from then on for any piece of software. Much like understanding a windowing system or how to click with a mouse.
July 23rd, 2007 at 10:44:40
Fritz Anderson pointed out on the darwin-dev list today that you should use the -y flag when using command line zip.
Without
-y,
zip will follow symbolic links and include the files from them. The problem with this is that embedded frameworks have lots of symbolic links in them to help with versioning. So if you don’t use-y
, you will end up with three copies of everything in your embedded framework. Using-y
brings command-line zip in line with Finder’s “Make archive of” command.Adium is particularly affected by this, of course, because it uses a lot of embedded frameworks.
July 23rd, 2007 at 12:11:51
Or it would be, if we used zip archives. ;)
July 23rd, 2007 at 15:16:39
Ah, I see. You meant in the comparison itself. I hadn’t seen your blog post yet. ☺
I’ll be updating the comparison shortly with this new info. Thanks for passing it on!