WWDC 2007 session videos are out
Monday, July 30th, 2007If you attended WWDC, you can head over to ADC on iTunes and see what you missed.
If you attended WWDC, you can head over to ADC on iTunes and see what you missed.
swab(3) is a function that copies some bytes from one location to another, swapping each pair of bytes during the copy. Handy for structures.
It has a feature that isn’t mentioned in the Darwin manpage for swab: If you pass a negative size, it does not swap. I have no idea why this magic behavior was added; if you want a swab that doesn’t swap bytes, just use bcopy. I shake my head at this use of a magic argument.
These four functions swap the bytes of a 16-bit (‘s’) or 32-bit (‘l’, in ignorance of LP64) integer and return the transformed value.
They are mainly used in network-I/O contexts, as they transform between network byte order (big-endian) and host byte order (whatever you’re running). But there’s nothing stopping you from using them for any other 16-bit/32-bit integral byte-swapping.
The Darwin kernel provides a number of handy-dandy macros for byte-swapping:
The {Host,Big,Little}To{Host,Big,Little} functions swap conditionally; the others always swap.
According to the Universal Binary Programming Guidelines, it is safe to use these in applications.
CF’s Byte-Order Utilities provide the same facilities as OSByteOrder, with a couple of twists:
Foundation’s byte-order functions bear all the same capabilities as the CF Byte-Order Utilities. In fact, they are implemented with them.
These utilities are equivalent to the Foundation functions, except that they are implemented using the OSByteOrder utilities. They are declared in <architecture/byte_order.h>.
I think that the “Core Endian” name itself is new in Panther. Three functions in the API have a “CoreEndian” prefix, and are marked as new in Panther, whereas the others have simply “Endian”, and are marked as having existed since 10.0. This suggests to me that the entire API was branded “Core Endian” in 10.3, with the older functions subsumed by it.
The new functions have to do with “flipper” callbacks, which you can install so that things like Apple Event Manager can DTRT with your custom data types. The older functions are plain byte-swapping utilities, just like all the other APIs described here, and exist mainly for the benefit of QuickTime users (they exist on Windows, too, through QuickTime).
Inspired by my previous post, in which I used them to time calloc and malloc…
You may have noticed that gettimeofday returns microseconds, and that this isn’t always a fine enough resolution (especially on the fast computers of today). On OS X, one solution is to use a Mach clock instead. (Another one is mach_absolute_time and AbsoluteToNanoseconds, as described by QA1398. With all the union or pointer magic you have to do with AbsoluteToNanoseconds, though, there’s not really an advantage; I think Mach clocks are slightly cleaner.)
Get a Mach clock port using host_get_clock_service:
clock_serv_t host_clock;
kern_return_t status = host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, &host_clock);
There’s no difference in resolution between SYSTEM_CLOCK and CALENDAR_CLOCK as of Mac OS X 10.4.8. The difference between them is that SYSTEM_CLOCK is time since system boot, whereas CALENDAR_CLOCK is since the epoch (1970-01-01).
Get the time using clock_get_time:
mach_timespec_t now;
clock_get_time(host_clock, &now);
mach_timespec_t is basically the same as a struct timespec from time.h.
I wondered whether there’s a performance advantage to using calloc to allocate zeroed bytes rather than malloc. So, as I usually do in such situations, I wrote a test app.
Along the way, I found out that gettimeofday — the traditional high-resolution what-time-is-it function — is not high-enough resolution. On my machine, the differences are smaller than one microsecond, which is the resolution of gettimeofday. So I switched to a Mach clock, which provides nanosecond resolution.*
I’m running Mac OS X 10.4.8 on a 2×2×2.66-GHz Mac Pro. The first run looks like this:
Method | Time |
---|---|
calloc, 100 MiB * 1 | 0.000006896000000 seconds |
malloc, 100 MiB | 0.000007790000000 seconds |
calloc, 100 MiB * 1, followed by reading first byte | 0.000012331000000 seconds |
calloc, 10 MiB * 10 | 0.000024079000000 seconds |
calloc, 10 MiB * 10, followed by reading first byte | 0.000031266000000 seconds |
malloc followed by bzero | 2.252493061000000 seconds |
Ouch! Two-and-a-quarter seconds for bzero. A second run returned saner numbers:
Method | Time |
---|---|
calloc, 100 MiB * 1 | 0.000007140000000 seconds |
malloc, 100 MiB | 0.000007317000000 seconds |
calloc, 10 MiB * 10 | 0.000008956000000 seconds |
calloc, 100 MiB * 1, followed by reading first byte | 0.000012812000000 seconds |
calloc, 10 MiB * 10, followed by reading first byte | 0.000031807000000 seconds |
malloc followed by bzero | 0.138714770000000 seconds |
bzero has greatly improved, but it still loses badly, taking more than a tenth of a second.
Lesson: Always use calloc when you need zeroed bytes. Don’t try to be lazy by zeroing them later — calloc is much better at that.
If you want, you can play with calloctiming yourself. If you want to post the results in the comments, use the included sed script to convert the output to an HTML table: ./calloctiming | sort -n -t $':\t' +1 | sed -f output_to_html.sed.
* Another way that I didn’t think of until just now is QA1398‘s recommendation of mach_absolute_time** and AbsoluteToNanoseconds. I tried that too and the output is no different from the Mach clock. ↶
** And when I went looking for documentation of mach_absolute_time to link to, I found this instead: Kernel Programming Guide: Using Kernel Time Abstractions. This information is for those writing kernel extensions; it recommends using clock_get_system_nanotime. Won’t work in user space, unfortunately. ↶