Archive for May, 2007

iTunes is now available without DRM

Wednesday, May 30th, 2007

They’re calling it “iTunes Plus”, and the current Single-of-the-Week is available in it. Took them long enough—yesterday was the last Tuesday in May, and they didn’t even get done with it until today.

Ah, well. At least it’s finally here. Woo-hoo!

By the way, you need to update to iTunes 7.2 and agree to the iTS terms and conditions again to turn on iTunes Plus. The actual T&C haven’t changed, but the Terms of Sale have.

What if we had a language keyword for ownership?

Tuesday, May 29th, 2007

For example:

retained NSImage *myImage;

mutablyCopied NSTextStorage *myTextStorage;
NSLayoutManager *layoutManager; //Owned by myTextStorage
NSTextContainer *textContainer; //Owned by layoutManager

Or instead of retained, we could have a keyword for non-ownership, with retain being the default for instance variables:

NSImage *myImage; //Retained by default

mutablyCopied NSTextStorage *myTextStorage;
borrowed NSLayoutManager *layoutManager; //Owned by myTextStorage
borrowed NSTextContainer *textContainer; //Owned by layoutManager

(Hopefully, this is already common, but using comments rather than language keywords.)

Perhaps these could be implemented using something like Python’s decorator syntax:

#define retained @ownership(retain)
#define copied @ownership(copy)
#define mutablyCopied @ownership(mutableCopy)

//What the code looks like after macro-expansion, or without using macros at all:
@ownership(copy) NSString *myTitle;

Either way, the correct retain/copy/mutableCopy/release magic would happen automatically on any assignment. The only manual work still involved would be:

- (void)dealloc {
	[myTimer invalidate];
	myTimer = nil; //Implicitly releases myTimer

	myTextStorage = nil; //Implicitly releases myTS (and on death, it releases its LM, and that releases its TC)

	[super dealloc];
}

This would make it totally impossible to forget to retain something (forgetting to retain or copy things leads to double-releases or zombie objects), and make it much harder to retain when you should copy or vice-versa. It would also make your header much richer in documentation, and the more you can learn from the header, the easier reading the implementation will be.

Now that’s what I call a speed-up

Tuesday, May 29th, 2007

CPU Usage 0.4 creates and throws away an NSImage every time a view (one per processor) updates. This is incredibly wasteful: The application uses about 0.7% of a CPU on my system.

But I decided that I can make that faster. Profiling (thanks, Shark!) revealed that the hot spot was creating, measuring, drawing, and throwing away the attributed string that goes into the NSImage. That’s not hard to optimize away: as the old doctor joke goes, “stop doing that”.

So I created a couple of branches, and went in two different directions:

  • Direction A uses NSLayoutManager—specifically, one NSTextStorage, one NSLayoutManager, and one NSTextContainer for every (CPU, percentage) pair. A view can display any of 101 percentages (0–100%), so on my four-core system, this branch creates 404 storages, 404 managers, and 404 containers.
  • Direction B uses an NSImage, just like 0.4 does, but keeps it around.

Both branches are written to create the (NSImage|NSTS, NSLM, and NSTC) lazily, but for testing, I added this code to force the creation of all the cached objects up front so that my time-trials would represent normal usage (that is, usage after all the objects have been created and cached):

- (void)drawRect:(NSRect)rect {
    //BEGIN TEMP
    if(!(percentageImages[0U])) {
        static BOOL isInDrawRect;
        if(!isInDrawRect) {
            isInDrawRect = YES;

            NSLog(@"Preloading percentage images for CPU %u", CPUNumber);
            NSRect tempImageBounds = [self bounds];
            NSImage *tempImage = [[NSImage alloc] initWithSize:tempImageBounds.size];
            [tempImage lockFocus];
            for(float usage = 0.0f, maxUsage = 1.01f; usage < maxUsage; usage += 0.01f) {
                [self setCPUUsage:usage];
                [self drawRect:tempImageBounds];
            }
            [tempImage unlockFocus];
            [tempImage release];
            NSLog(@"Done preloading percentage images for CPU %u", CPUNumber);

            isInDrawRect = NO;
        }
    }
    //END TEMP

    ⋮
    (The rest of -drawRect: is here)
}

That’s from the NSImage branch, but the code in the NSLayoutManager branch is basically the same. (Note: tempImage is not the cached image; it’s just a throwaway destination for the drawing done by -drawRect: in the inner call.)

Once I had finished this, and fully optimized both branches using Shark, the next step was to try them out and see how they fare.

I launched all three versions of CPU Usage, and then did the following:

# Watch the CPU usage of the CPU Usage processes for five minutes (300 seconds)
top -l 300 | fgrep 'CPU Usage' > top-CPU_Usage.txt      %/Volumes/RAM Disk(130)
___
# Fourth column (as determined by whitespace) is CPU usage
fgrep 221 < top-CPU_Usage.txt  | awk '{ print $4; }' > top-CPU_Usage-0.4.txt
___
fgrep 1121 < top-CPU_Usage.txt | awk '{ print $4; }' > top-CPU_Usage-NSLM.txt
___
fgrep 1118 < top-CPU_Usage.txt | awk '{ print $4; }' > top-CPU_Usage-NSImage.txt
___
# The results, in percent of one CPU
~/Python/avg.py < top-CPU_Usage-0.4.txt                   %/Volumes/RAM Disk(0)
0.671
___
~/Python/avg.py < top-CPU_Usage-NSLM.txt                  %/Volumes/RAM Disk(0)
0.667
___
~/Python/avg.py < top-CPU_Usage-NSImage.txt               %/Volumes/RAM Disk(0)
0.737

Note: For timing purposes (since top’s CPU-usage display is in tenths of a second), I divided CPU Usage’s sample interval by ten. Normally, it samples every 0.5 sec; the two prototypes above sample every 0.05 second (that is, ¹⁄₂₀ sec instead of ¹⁄₂ sec).

This means that to compare them to 0.4, you must divide the results by ten to adjust them back to the half-second interval that a release would have. Here are the *real* results:

0.4 0.671%
NSImage 0.0737%
NSLayoutManager 0.0667%

So CPU Usage 0.5, with the cached-NSLayoutManager behavior, will use ¹⁄₁₀ as much CPU as 0.4 does. And here’s what that looks like:

All four processors show 0% usage.

Sweet!

I’d forgotten how awesome Sherlock is

Sunday, May 27th, 2007

Compare this translation interface (what I’ve been using):

Babel Fish, with its small input textarea,
Babel Fish (in Camino, because none of my other browsers work with it)

to this one (what I should be using):

Sherlock, with its text fields that take up almost the entire window.

I used to hate Sherlock under Mac OS 9—I went so far as to custom-install Find File from Mac OS 8.5. But Sherlock under Mac OS X is pure awesome.

Now, if they kill it in a future OS version, I’ll be sad. And I may end up cherry-picking from the old OS disc again.

Inspirational posters

Saturday, May 26th, 2007

For your enjoyment:

Parallelism: beautiful It's thing. a

And an alternate version:

Threading: beautiful It's thing. a

Created with Motivator. Choice of screenshot inspired by Gus Mueller’s “viii”.

ADDED 2007-05-26: Here’s a third version using a photo from thefunniest.info.

Apple Bug Friday! 60

Friday, May 25th, 2007

This bug is NSImageInterlaced has no effect. It was filed on 2007-05-12 at 00:50 PDT.

(more…)

Apple Bug Friday! 59

Friday, May 25th, 2007

This bug is The NSImageInterlaced, it does nothing!. It was filed on 2007-05-12 at 00:29 PDT.

(more…)

Blinking sarcasm light

Wednesday, May 23rd, 2007

SARCASM ALERT!

In case you ever need to graphically point out that you are being sarcastic, here’s an animated GIF you can use to do that. Please copy it to your own website before use, or at least use CoralCDN, so I don’t get a big bandwidth hit if the page you use it on becomes popular.

I created it using the GlassGiant neon sign generator, Core Image Fun House (which is how I created the unlit frame: by creating a “white” version with the generator, then darkening it with an Exposure Adjust filter), and GifBuilder.

Feeling creative? Here are the original lit, white, and unlit frames, in PNG format.

Virtual key-codes

Tuesday, May 22nd, 2007

Anybody who’s ever needed to work with virtual key-codes—especially to program a hotkey—has had the problem of looking up the key-code for a specific key. The usual solution is to fire up Peter Maurer’s Key Codes.app and press the key, but wouldn’t it be nice to look it up in a handy table that you could print out?

There actually is such a table, but it’s well-hidden in the Apple documentation. Mac OS X uses the same virtual key-codes that it used for the legendary Apple Extended Keyboard. Thus, the table in Inside Macintosh: Text still applies.

The problem is the asstastic low-resolution JPEG scan of the table that Apple provides in the online version of IM:Tx:

Good luck with that!

So I did some poking. It turns out that there is a PDF version of IM:Tx on the ADC website—complete with a vector, rather than raster, table. Unfortunately, opening a PDF and jumping to figure C-2 is no easier than firing up Key Codes and pressing the key.

So here’s a handy-dandy crop of the PDF (with attribution added). Because it’s a vector image, the key codes in this version should be clearly readable at any resolution. Here’s what it looks like:

In case you’re wondering, I cropped it by copying the figure in Preview, then pasting into Lineform, which enabled me to add the attribution under the figure heading.

In a previous version of this post, I provided a 600 dpi PNG version of the key-codes table. In making that one, Lineform also enabled me to export to PNG at 600dpi rather than 72.

UPDATE 2008-11-29: Replaced the PNG image with a PDF document.

Announcing my new neuroblog, as well as the word “neuroblog”

Monday, May 21st, 2007

The concept is basically that of a Tumblelog, but I’m not using Tumblr to host it, so it seems to me like it would be disingenuous to call it a Tumblelog. (UPDATE 15:32: I just checked the Wikipedia article for “Tumblelog” (probably should have done that earlier, hm?); it says that the word originated from a source other than Tumblr, so maybe I don’t have to worry about it.)

So it’s a neuroblog. Thoughts of the Bored is a dump of my brain; I’ll post links, brief commentaries, programming insights—anything that’s interesting enough to show the world, but isn’t big enough to merit a whole blog post or sufficiently worth-keeping to merit a bookmark on my del.icio.us (in the latter case, the link is context rather than the focus of the post).

If you want a feel of what it’ll be like, just check out the front page—there are already a bunch of posts there.

New service: del.icious Info for URL

Sunday, May 20th, 2007

I just posted my second ThisService-created service: del.icio.us Info for URL. Select a URL within some text, then invoke the service, and it will open the del.icio.us info page for the URL (assuming that at least one person has bookmarked it) in your browser.

AFAIK, this is also the first pure-shell-script service.

The most efficient way to waste time

Wednesday, May 16th, 2007

In profiling CPU Usage, I need to get my CPUs busy so that the CPU-usage views have something to do. This means that I need a program to busy-wait.

Busy-waiting means running a tight loop that doesn’t actually do anything except run. In C, the most efficient such loop is:

for(;;);

That’s all well and good, but it only busy-waits a single processor. I have four, and I need 1 < n < 4 of them to be lit up so that CPU Usage has something to indicate (otherwise it will sit there showing 0-0-0-0, which doesn’t make good profiling—busy processors will jump around a bit, which gives CPU Usage something to do).

Now, my first approach was to write this in Python. That’s my go-to language for anything without a GUI. Here’s what came out:

#!/usr/bin/env python

def busy_wait():
    from itertools import repeat
    for x in repeat(None):
        pass

import thread, sys
try:
    num_threads = int(sys.argv[1])
except IndexError:
    num_threads = 100

for i in xrange(1, num_threads): #We'll do the first one ourselves after starting all the other threads.
    throwaway = thread.start_new_thread(busy_wait, ())
busy_wait()

Looks good, right?

What was weird is that I couldn’t seem to get it to max out all my processors, even with num_threads=5000. That seemed mighty suspicious.

It was then that I remembered the Global Interpreter Lock.

You see, in CPython, only one thread can be running Python code at a time. (Exceptions exist for things like I/O, of which my program contains none.) This means that my yummy multithreaded busy-wait program—being purely Python—was effectively running single-threaded.

I reimplemented the program in pure C. Not only does it run much more efficiently now (no interpreter overhead), but it also requires far fewer threads: Four threads will light up all four processors. Victory!

If you want a copy for yourself, here it is. It takes one argument, being the number of threads to spawn. It defaults to the number of logical CPUs in your machine (HW_CPU in sysctl), so if you just run it with no arguments, it will will spawn one thread per processor.

Disk images suck: An examination of why, and of the alternatives

Sunday, May 13th, 2007

Here is a rundown of every file format you can use to distribute your software, along with their advantages and disadvantages. They are listed in ascending order of effectiveness.

In case you’re wondering, 7-zip and StuffIt X are not listed here because nobody has decompressors for them. I wrote this list for the context of software distribution; the days of saying “You will need StuffIt Expander to open this file” are long over.

Disk image

Advantage:

Sandboxes the application: If it won’t work from the disk image, it probably sucks

Disadvantage:

Confuses the average user; they typically run the application from the disk image, then encounter problems when they try to delete the image file or (much later) run a Sparkle update


Back in March, an Adium user had a problem trying to perform the Sparkle update. It gave her an error:

“Update Error! Adium does not have permission to write to the application's directory! Are you running off a disk image? If not, ask your system administrator for help.”

This wouldn’t be so bad, except that she didn’t know what a disk image was, so she assumed that she was not running Adium from the disk image (I imagine she thought something like “surely I’d know what that was if I was using one”). She looked up the troubleshooting instructions in the Adium help (good), then critically misunderstood them (bad) with the result that she moved her Adium 2.0 folder out of the Application Support folder. (Don’t ask me to explain it. I don’t know, either.)

All this ultimately resulted from the fact that average users, including her, don’t recognize a disk image when they see it. They don’t expect a file to act as a drive.

The same non-obvious nature results in other problems: specifically, the error message “The operation cannot be completed because ‘SurfWriter-1.0.dmg’ is in use”. This occurs when the user tries to delete the disk image file without unmounting it first. Users who encounter this message end up contacting us (usually asking “how do i uninstall it”). This is only natural, because who would expect that a file is mounted, or that a file can be in use in another file?

It was then that I realized that disk images are not as great for the average user as I had previously believed.

You may be thinking “well, just put a background image with an arrow to an Applications folder”. We did that. It didn’t help this user.

The stories I heard from other developers suggest that people in general zero in on the application they want to use, without paying attention to anything else (e.g., arrows, help text, or symlinks). Besides, the questions I hear from the users via the feedback list suggest that users see the disk image window and think the application is already installed—they aren’t looking for or expecting installation instructions, so your big arrow means nothing to them.

I suspect that installers are partly to blame for that: I don’t know about Windows, but on Mac OS, installers would always open the folder containing the freshly-installed application, so that you could use it right away. I suspect that users mistake the disk image window for a freshly-installed-folder window.

Anyway, one solution to the tunnel-vision problem would be a “DRAG THIS OVER HERE” message in a big font—but that’s no good, because you can’t have localized background images. You’d have to pick one language, and hope that all your users know it, but anybody who doesn’t know the language you chose wouldn’t benefit from your nice obnoxiously big text. (On the other hand, this provides a second reason for unilingual builds: One disk image per language, with both the background and the application localized in that language only. Hmmmm.)

Another solution is a runtime check whether the app is running from the disk image. You’ll have to be careful with wording here, though—you need to be 100% sure that you’re running from a disk image, and write the alert text accordingly. There can be no vacillation like in the Adium alert box (which vacillates because it’s caused by a simple permission failure, not an actual search for the disk image nature); it must be a statement, not a question. And of course, you must tell the user the remedy (copy the app, then run the copy, then eject the disk image file and never mount it again).

One last point: bzip2 compression (-format UDBZ) requires Tiger. Your disk image won’t mount on any earlier version of the operating system; your users will think it’s corrupt, since that’s what the error message suggests, and they’ll contact you with that assumption. This probably doesn’t matter anymore, since the Omni Group’s software-update statistics say that 98.4% of users are on Tiger (as of 2007-05-12), but if you still support users of earlier OS versions, you need to use zlib compression (-format UDZO) instead.

Tarball

Advantages:

Smaller than disk images; unpacks into a folder or bare application

Disadvantage:

Safari mishandles bzip2’d tarballs pretty spectacularly; one must gzip one’s tarballs to avoid that problem, but then one loses out on yummy bzip compression


Safari, for some reason, saves a bzip2’d tarball as “foo.tbz.tar”, even though it is still a tbz. Unpacking it results in a tar archive named “foo.tbz”, followed by the folder that the tar archive exists to hold. So the actual unpacking behavior works as expected, but it badly screws up the filenames.

This all works correctly with tgz, but of course you can’t use gzip and have bzip2 compression.

Zip archive

Advantages:

Unpacks into a folder or bare application; can be created from the Finder

Disadvantage:

Bigger than anything else


Zip archives are the easiest to create, because you can simply right-click on the contents and choose “Create Archive of SurfWriter.app”. Unfortunately, the compression ratio is just not there; I consider zip files obsolete for this reason. You should be optimizing for your users’ download time, not your own compression time.

Internet-enabled disk image

Advantage:

Unpacks into a folder or bare application for the average user (Safari with “Open Safe Files” turned on); perfectly normal for everybody else

Disadvantages:

Won’t work the same way twice (when the user goes to unpack it the second time, it behaves as a perfectly normal disk image, with the attendant confusing UI); hard to create; unpacks slowly


This is the best of both worlds. The average user uses Safari and has “Open Safe Files” turned on; in this case, Safari will unpack the disk image just as if it were a tarball or zip archive. The sort of person who turns “Open Safe Files” off, or uses a different browser, is also the sort of person who can handle a normal disk image, and will indeed be handling it because that’s how the disk image behaves in those cases. The extremely-rare exceptions can be handled by the aforementioned runtime check.

They’re the hardest to create because you need to use a Terminal command (hdiutil internet-enable SurfWriter-1.0.dmg) to set the internet-enabled bit on the image. Dear Lazyweb: Please make a contextual menu item that generates an internet-enabled UDBZ disk image directly from a folder in one step, the same way I can make a zip archive in one step. (Michael Tsai, in a comment, says that his $20 DropDMG utility can do this, with Automator‘s help.)

Another disadvantage is that a disk image, internet-enabled or otherwise, takes much longer to process than a zip archive. I think this is because of the verification step, but the user won’t care. I think people will put up with it for most archives, but if your archive is huge (let’s say over 50 MiB), you may want to switch to a zip or tarball to save time. Of course, you’ll be sacrificing part of your bandwidth bill for that. (Thanks to Sven-S. Porst for bringing this up in another comment.)

Also, the format notes (UDBZ vs UDZO) for disk image above apply to internet-enabled disk images as well. Just in case you were wondering.

I bring this up because, having in mind my objections above to disk images, I released EasyMD5 as a zip archive. I did it this way because EasyMD5 is targeted at Adium users (specifically, those who have problems downloading Adium); experience has proven that I can’t assume that an Adium user will know how to deal with disk images. Everybody knows how to handle zip archives (hello, Chris!), so I made it a zip archive.

I hadn’t yet done the study of file sizes that I did and published yesterday. Now that I have, later today, I’ll replace the zip archive with an internet-enabled disk image.

UPDATE 11:37: Added mentions of Paul Kim’s proposal of a runtime check for the disk image nature, and clarified the Lazyweb request.

UPDATE 20:47: Updated to include comments from Sven-S. Porst and Michael Tsai.

Comparison of compressed archive file sizes

Saturday, May 12th, 2007

While writing an upcoming blog post, I decided to get some hard numbers. So here is a comparison of archive formats and compressors.

The input is Adium 1.0.3. Specifically, I copied the volume of the official disk image to my RAM disk; this created a folder. I deleted the custom “disk image” icon from the folder (using Get Info), but otherwise left it intact.

I created a script that starts each compression process, followed by a wc -c command to print its size in bytes. Prior to running the script, I created one zip file using the Finder; all other files were generated from the script.

These are the results:

Bytes Filename Comment
8108955 Adium_1.0.3.7z
8479898 Adium_1.0.3-Binary.sitx StuffIt 11, “Best Binary Compression” method, compression level 5 (of 5)
11464784 Adium_1.0.3-StuffIt.tbz StuffIt 11, tar+bzip, compression level 9
11472453 Adium_1.0.3-pbzip2.tbz tar and pbzip2
11507671 Adium_1.0.3-cjf.tbz tar cjf
13209615 Adium_1.0.3-StuffIt.tgz StuffIt 11, tar+gzip, compression level 15 (of 15)
13363283 Adium_1.0.3-UDBZ.dmg hdiutil create -format UDBZ
13690168 Adium_1.0.3.tgz tar czf
14723809 Adium_1.0.3.sitx StuffIt 11, “Choose Method by Analysis”
14757937 Adium_1.0.3-UDZO9.dmg hdiutil create -imagekey zlib-level=9
15554735 Adium_1.0.3-StuffIt.zip StuffIt 11, Deflate method, compression level 15 (of 15)
16010091 Adium_1.0.3.zip Finder’s “Create Archive of” command
30497119 Adium_1.0.3-zip.zip zip -r9

7-zip and StuffIt X are included for academic purposes only, since the average user doesn’t have a decompressor for them. (The decompressors are The Unarchiver for 7-zip and StuffIt Expander for StuffIt X.)

Those two formats, the oddballs, are the clear winners—both break the 1-MiB barrier. These are followed by bzip2, gzip, and finally zip.

Specific engines do make a difference. StuffIt‘s engine won across the board. pbzip2 beat tar cjf, which is strange, since pbzip2 uses libbzip2. Even StuffIt’s tgz beat UDBZ, and the two command-line tbzs beat it by a couple of MiB (probably the size difference in the uncompressed meats). For gzip compression, StuffIt beats GNU zip. (The StuffIt X format, with the “Choose Method by Analysis” method, appears again in between gzip and UDZO; apparently, StuffIt’s analyzer needs some work.) Finally, zip: It seems obvious that the implementation of zip(1) is amazingly bad, considering how badly it got beaten by both StuffIt and Finder.

StuffIt made an impressive showing in this test, and interface-wise, I’m happy to say that it’s a lot better than it was back in versions 6–8. I may start using it again for future compression work. (No, I’m not being paid by Smith Micro.)

Of course, your mileage may vary.

Report-an-Apple-Bug Friday! 58

Saturday, May 12th, 2007

Slightly late because I had to devise a way to determine whether a GIF file is interlaced. (I settled on GifBuilder, in case you’re curious.) This ties in with the next two bugs; I’ll blog both at once next week.

This bug is NSImageInterlaced documented as working on half of known interlaceable types. It was filed on 2007-05-12 at 00:27 PDT.

(more…)

New utility: EasyMD5

Thursday, May 10th, 2007

I’ve just released a simple application called EasyMD5. All it does is compute an MD5 hash for any file you drop on it.

I plan to use this to try debugging “your disk image doesn’t work” reports that we get on the Adium feedback list occasionally.

Tabs vs. spaces

Monday, May 7th, 2007

Jens Alfke wrote a post of coding tips that includes this advice:

Don’t use tab characters in source files!

The world will never come to an agreement on whether a tab character indents 8 spaces or 4, especially on the Mac, where lots of Unix tools (and Unix source code) are hard-coded for 8. So since different people will have their tab-width preferences set differently, just don’t use tab characters in your source code if you want everyone to be able to read it.

In Xcode, go to the Indentation pref pane and uncheck “Tab key inserts tab, not spaces”. In Textmate, check “Soft Tabs” in the tabs pop-up at the bottom of the editor window. You won’t notice a difference in editing text, but your source code will now look properly indented to everyone.

No, no it won’t. Because now you are forcing your indentation preference on everyone else.

Let’s make one thing clear: tabs have no intrinsic size.. For example, he says “if…you view code that uses 4-char tabs for indentation…”. This is patently wrong: There is no such thing as a 4-char tab. Tabs have no width of their own; they simply say “move to the next tab stop”, and it’s up to the viewer application to determine where the next tab stop is.

Here’s my comment on his post, replying to one of his own comments (which, in turn, was a reply to somebody else’s comment about tabs):

Peter Hosey:

Jens Alfke:

And I find the “wrong” indentation level in files much less annoying than the “wrong” tab width, because the latter makes the indentation completely impossible to follow without reformatting.

But a tab is always the right width, because it’s the viewer who sets it, not the author. Indenting with spaces will look wrong when you move the code to someone who uses more or fewer spaces than you do; tabs don’t have that problem.

The problem comes when you use tabs to create columns. That’s wrong, because then the columns don’t line up when the tab width changes. That, I think, is where your objection originates. You should always use spaces to create columns.

Indentation, however, is the proper use of a tab, and tabs are the proper way to indent.

I’ll take his reply point-by-point:

Jens Alfke:

Peter: It’s not that simple. If the tab width is set to 8, as in all Unix-derived code (and all the Cocoa sources I’ve seen), then the editor uses a mixture of tabs and four spaces to get the 4-character indents.

What?

First off, no source code contains a tab width set to anything*. As I said, tabs have no intrinsic width; it’s your editor/viewer that assigns width to a tab. So, if a tab is 8 characters, it’s because you said so in the viewer application’s prefs.

And if you have set your tab width to 8, then what 4-character indents are you talking about? Are you trying to force such indents despite your setting the tab width to 8?

(There is one exception: If you use a method that has one parameter with a longer name than the first line of the message-statement, Xcode will sacrifice some of the indent in order to colon-align that parameter, resulting in spaces where there would otherwise be a tab. This is a symptom of its colon-alignment logic, which is described below. This only matters in 1%, at most, of Cocoa code, and 0% of other code; as such, this special case should not dictate indentation policy for all other code.)

So that code is going to look completely messed-up to someone with different tab settings.

Maybe so. But if you use nothing but tabs, then it will look perfectly correct to somebody with different tab settings. Let’s say you prefer 4-character indents, and they prefer 2-character indents. Your tab width should be set to 4, and his to 2.

If you indent your code with four spaces, then he will see two of his two-character indents (2×2). But if you use a single tab, then it will still look like four spaces (one indent) to you, and it will look like two spaces (one indent) to him, exactly as he expects.

But if you have tabs set to 8 and you view code that uses 4-char tabs for indentation, the indentation level is 8 characters, which is pretty ridiculous looking and makes most normal code fall off the right edge of the window. So a tab is absolutely _not_ the right width for me.

Here’s what I think you meant:

…if you view code written by somebody who set his tab width to 4 characters, despite preferring 8-character indents, so that he uses two tabs instead of one, then it will look ridiculous.

Which is exactly right, but more the fault of the programmer who does not set his editor’s Preferences to match his preferences than the fault of the tab character. The tab character is innocent in this; it was misused, and that’s what caused the problem.

(Moreover, nearly all code I’ve seen uses extra spaces for indentation. Xcode does this for you in Obj-C code, to make the colons line up. That stuff looks really awful if you change the indentation width.)

Actually, Xcode uses tabs to line up the colons. That’s wrong, and it’s why it looks awful. Xcode should use tabs until it matches the indentation of the start of the statement, and then continue with spaces. For example, if the first line of the statement is indented with two tabs, then every other line of the same statement should also be indented with two tabs, followed by spaces to align the colons. Xcode uses tabs; that’s what causes the messed-up alignment when you change the tab width.


Wrapping it up in a little bow

Here’s an executive summary of the issue. Hopefully this will make things fully clear.

  • Programmers indent their code to indicate scope.
  • For every additional level of scope, there is one indent. This rule is invariant.
  • Different programmers have different preference for the width of an indent. Some prefer four characters; some prefer eight characters; some prefer two characters; and some (crazy people) prefer one character.
  • Some programmers use one tab character (U+0009 HORIZONTAL TAB) per indent. This option is usually referred to as “real tabs”. The programmer defines his preferred indent width in the preferences, and the editor uses that width to define the width of one tab.
    • However, this width is not saved in the file. This is useful, because it means that when you portage the file to another editor where the indent width is different, it will take on the new indent width (e.g. eight characters instead of four) with no modification to the file.
  • Other programmers use one or more space characters (U+0020 SPACE) per indent. This option is sometimes referred to as “soft tabs”. The programmer defines his preferred indent width in the preferences, and the editor indents by inserting that many spaces.
    • This width is saved in the file. Another editor will show the same width, which may not be the width that the other programmer prefers and expects. This is great if you want to force your preferences upon everybody else, but if you would prefer to avoid a formatting war, then everybody should use tabs and set the width that their editor will use for a tab according to their own preferences. Nobody need ever know that other people’s indents are different widths.
  • Thus, tabs are superior because one tab = one indent, regardless of width; it has the right width for you (according to the width you defined in your preferences) and the right width for everybody else (according to the widths they defined in their preferences).

UPDATE 2008-11-29: See also the sequel to this post: Tabs vs. spaces redux.


* I’m aware that UNIX editors like vim and emacs support special meta-text that you can embed that will set the tab width, among other things. I consider this a cheap hack intended for people who do not understand the difference between tab and space indents, or for people who have to deal with those people.

Ever wonder which is the fastest way to concatenate strings in Python?

Sunday, May 6th, 2007

This is a response to Ever wonder which is the fastest way to concatenate strings in Ruby?.

I tested using Python 2.5 on Mac OS X 10.4.9 on my four-core Mac Pro. Here are the results:

Method Time (seconds per million iterations)
+ 0.308359861374
str.join(list) 0.53214097023
str.join(tuple) 0.48233294487
% 0.515310049057

That’s quite a surprise—the usual advice is to avoid the + operator because it is inefficient. But here we see that it wins quite handily. Google revealed Chris Siebenmann’s explanation: Python 2.4 fixed the + operator to be much more efficient by only allocating one string instead of n-1 strings.

So, clearly, the old advice no longer applies. Go forth and use +.

Oh, and in case you’re curious, here’s the code. (The times shown above are using the re-create-the-string-every-time version of the code, for comparison’s sake. Not doing that only saves about 1100 second on each test-case.)

Reddit for Digg users: A tutorial

Friday, May 4th, 2007

Fresh from Digg? Welcome to Reddit!

You’ll find that Reddit has a lot of advantages over Digg. In order to save you a lot of time, I’ve compiled a list of them. These are the same things I had to learn when I came over from Digg, about a year ago IIRC.

  1. Reddit doesn’t have descriptions.

    When you submit a link to Digg, you have to enter a description. Reddit doesn’t have this feature; instead, you must completely sell the article in the title.

    The field that appears when you submit the article is not a description field—it’s the comment field. What you enter there is a comment, not a description, so it won’t show up on the new page, the recommended page, or any other list of articles.

    That means that it won’t help you sell the article. Your title must stand alone.

    (And don’t worry about the length limit. It’s high enough; trust me.)

  2. Comments support rich text.

    Unlike Digg, which has no styling support whatsoever (except for auto-linking), Reddit uses Markdown, a really, really easy plain-text mark-up format. The syntax makes emphasis, links, blockquotes, lists, etc. almost as easy as plain text. Yes, really. And you’ll find it makes your comments so much better (and be received so much more happily) when you style them correctly.

  3. You can reply to any comment.

    Digg only has single-level threading: you can reply to top-level comments, but not to other replies. Reddit allows virtually any amount of nesting. This means you never need to say “@someotheruser: My comment here”—replying directly to that comment automatically implies that you are addressing that comment and no other.

    And definitely do not use the comment field at the top of the page. That’s for top-level comments only, not replies; you look quite silly if you try to reply with it. ☺

  4. “DUPE” is not appreciated here.

    First off, it isn’t uncommon at all to resubmit a page that didn’t do well before—in fact, it’s explicitly allowed by Reddiquette. (When it’s the exact same URL as the previous submission, the resubmission has a query string on it—usually just an empty “?”.) Second, if you do have a real dupe to speak of (one that actually made the front page by getting over 100 points), then link to the original Reddit story so that we know that you aren’t just crying “DUPE” for no reason—we won’t search for it for you. If you do that, then your comment will be voted up instead of down.

    Also, sometimes when a submission doesn’t do well, the submitter deletes it and tries again (maybe with a different title). This is normal and in line with Reddiquette, and doesn’t count as a duplicate. This is another reason to search for the original before crying dupe—if the original doesn’t come up in the search results, this suggests (not conclusively) that this is what happened.

    Speaking of Reddiquette…

  5. Reddit has a code of conduct that we all expect you to follow.

    You will be much better-liked by the community if you follow the rules set forth in Reddiquette. That and other help pages provide broader coverage of the advice I’m giving you here.

  6. You can easily see replies to your comments.

    Since comment threading isn’t the clusterfuck it is on Digg, Reddit is able to provide an “inbox” that shows all replies to anything you’ve said. (The same page shows private messages that have been sent you by other users, but this is much rarer.) The inbox is marked by a ✉ (envelope) icon in the top-right corner of the page, and if you have replies or messages waiting for you, it will be colored red.

    On a related note, you can click on your username in the same corner to get an “overview” of every comment and article you’ve posted. You can use this to quickly see how your articles and comments are doing.

  7. We have a very small definition of “spam”.

    On Digg, an article submitted by its author must meet a very high standard of quality in order to not get booted from the front page (or forestalled from even reaching the front page) for being “spam”. On Reddit, self-submission of articles is allowed and even encouraged (as you’ve seen already if you’ve already read the Reddiquette page), as long as your article is good (by the same standard as any other article).

    So please don’t cry “spam” or hit the report button just because the author and the submitter are the same person—that’s not enough. It’s only spam if the submitter lifted it from somewhere else and copied it to his site (that is, linkjacked it).

  8. We don’t mind curse words, but wanton cursing will get you downmodded.

    By this, I mean two things.

    First, you don’t need to say things like “f*ck”, “fcuk”, etc. There is no swear-filter to evade, so don’t worry about it. If you’re going to use that kind of language anyway, I think most of us would prefer that you use the real word.

    Second, you can use words like “clusterfuck”, “bullshit”, etc. without problem, as long as you use them judiciously. If you just throw every swear word you know into every comment, you’re wasting good words and you will be downmodded for it.

    Wanton cursing doesn’t make you look more adult, it makes you look less adult. Please think about using alternative words (or omitting the curse word entirely if you can’t think of one) rather than slathering your message in an excessive amount of curse words. If you use too many, it drowns out your message.

  9. If you have nothing to say, please don’t say it.

    I have seen many comments lately along the lines of “hahaha great link”, or “this article sucks, downmodded”. These comments invariably get downmodded because they are essentially empty.

    If all you want to do is express your like or dislike for the article, then vote up or vote down and move on. You don’t need to comment for that. And if you’re thinking of karma, it didn’t work: Comments don’t count toward karma.

    In a similar vein, random bashing of Bush/Microsoft/Apple/the Cookie Monster does not impress people here. You need to have a point, and make a salient argument backed by facts and logic. Otherwise, you will be downmodded.

    So you should only post a comment when you want to:

    • contribute a meaningful point to the discussion, or
    • make a joke (in which case, please make it funny—bad jokes are worse than no jokes)
  10. You can edit your comment at any time.

    On Digg, the ability to edit a comment expires three minutes after you post it. Reddit, on the other hand, does not impose a time limit on comment editing. So there’s no need to reply to your own comment to get in an edit that missed the window—there is no window.

Thank you for reading through my list of advice, and thank you in advance for following it. By doing so, you’ll help keep Reddit a better place.

And if you’re one of my regular readers and wondering what brought this on: Digg blew up recently, and a legion of its users moved over to Reddit. With this post, I hope to help them fit in in their new environment with its slightly-different social rules.

Just in case you missed the WWDC early registration period…

Tuesday, May 1st, 2007

…you didn’t.

Apple has extended the WWDC early registration deadline to May 11.