Archive for the 'Mach' Category

WWDC 2007 session videos are out

Monday, July 30th, 2007

If you attended WWDC, you can head over to ADC on iTunes and see what you missed.

How do I swap thy bytes? Let me count the ways

Saturday, April 28th, 2007
  1. swab

    swab(3) is a function that copies some bytes from one location to another, swapping each pair of bytes during the copy. Handy for structures.

    It has a feature that isn’t mentioned in the Darwin manpage for swab: If you pass a negative size, it does not swap. I have no idea why this magic behavior was added; if you want a swab that doesn’t swap bytes, just use bcopy. I shake my head at this use of a magic argument.

  2. ntohs, htons, ntohl, htonl

    These four functions swap the bytes of a 16-bit (‘s’) or 32-bit (‘l’, in ignorance of LP64) integer and return the transformed value.

    They are mainly used in network-I/O contexts, as they transform between network byte order (big-endian) and host byte order (whatever you’re running). But there’s nothing stopping you from using them for any other 16-bit/32-bit integral byte-swapping.

  3. OSByteOrder (Darwin)

    The Darwin kernel provides a number of handy-dandy macros for byte-swapping:

    • OSSwap{Const}?Int{16,32,64}
    • OSSwap{Host,Big,Little}To{Host,Big,Little}{Const}?Int{16,32,64}

    The {Host,Big,Little}To{Host,Big,Little} functions swap conditionally; the others always swap.

    According to the Universal Binary Programming Guidelines, it is safe to use these in applications.

  4. Core Foundation

    CF’s Byte-Order Utilities provide the same facilities as OSByteOrder, with a couple of twists:

    • The implementation uses assembly language when the environment is GCC on either PowerPC or x86. This is theoretically faster than OSByteOrder’s pure-C implementation. (CF falls back on pure C in all other environments.)
    • CF adds support for byte-swapping 32-bit and 64-bit floating-point numbers.
  5. Foundation

    Foundation’s byte-order functions bear all the same capabilities as the CF Byte-Order Utilities. In fact, they are implemented with them.

  6. NeXT byte-order utilities

    These utilities are equivalent to the Foundation functions, except that they are implemented using the OSByteOrder utilities. They are declared in <architecture/byte_order.h>.

  7. Core Endian

    Core Endian logo that I made up.

    I think that the “Core Endian” name itself is new in Panther. Three functions in the API have a “CoreEndian” prefix, and are marked as new in Panther, whereas the others have simply “Endian”, and are marked as having existed since 10.0. This suggests to me that the entire API was branded “Core Endian” in 10.3, with the older functions subsumed by it.

    The new functions have to do with “flipper” callbacks, which you can install so that things like Apple Event Manager can DTRT with your custom data types. The older functions are plain byte-swapping utilities, just like all the other APIs described here, and exist mainly for the benefit of QuickTime users (they exist on Windows, too, through QuickTime).

How to use Mach clocks

Sunday, November 26th, 2006

Inspired by my previous post, in which I used them to time calloc and malloc…

You may have noticed that gettimeofday returns microseconds, and that this isn’t always a fine enough resolution (especially on the fast computers of today). On OS X, one solution is to use a Mach clock instead. (Another one is mach_absolute_time and AbsoluteToNanoseconds, as described by QA1398. With all the union or pointer magic you have to do with AbsoluteToNanoseconds, though, there’s not really an advantage; I think Mach clocks are slightly cleaner.)

  1. Get a Mach clock port using host_get_clock_service:

    clock_serv_t host_clock;
    kern_return_t status = host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, &host_clock);

    There’s no difference in resolution between SYSTEM_CLOCK and CALENDAR_CLOCK as of Mac OS X 10.4.8. The difference between them is that SYSTEM_CLOCK is time since system boot, whereas CALENDAR_CLOCK is since the epoch (1970-01-01).

  2. Get the time using clock_get_time:

    mach_timespec_t now;
    clock_get_time(host_clock, &now);

    mach_timespec_t is basically the same as a struct timespec from time.h.

Implications

  • As I mentioned above, Mach clocks are one way to time things to nanosecond precision.
  • Mach clocks are also an easy way to implement half of uptime(1). CALENDAR_CLOCK gives you the wall-clock time, and SYSTEM_CLOCK gives you the actual uptime. (For the load averages, look at getrusage. You’re on your own for the number of logged-in users.)

Related reading

  • The Mach Kernel Interface manual. You may want to adapt that URL to your OS/machine combination (for example, 10.4.7.ppc for my Cube — no, I never got around to updating it to 10.4.8). You’ll need an APSL login, in any case.
  • Specifically, host_get_clock_service and clock_get_time. Those are opendarwin.org URLs, so you won’t need a APSL login to read them. Slightly old, but the current documentation doesn’t look any newer.
  • Also useful-looking are clock_get_attributes and clock_sleep. The former will tell you, among other things, the resolution of a clock (given a clock port from host_get_clock_service). The latter will sleep for some amount of time or until some time is reached.
  • clock_map_time looks like it would be very handy for an app that had to update a time display very very frequently.

calloc vs malloc

Sunday, November 26th, 2006

I wondered whether there’s a performance advantage to using calloc to allocate zeroed bytes rather than malloc. So, as I usually do in such situations, I wrote a test app.

Along the way, I found out that gettimeofday — the traditional high-resolution what-time-is-it function — is not high-enough resolution. On my machine, the differences are smaller than one microsecond, which is the resolution of gettimeofday. So I switched to a Mach clock, which provides nanosecond resolution.*

I’m running Mac OS X 10.4.8 on a 2×2×2.66-GHz Mac Pro. The first run looks like this:

First run
Method Time
calloc, 100 MiB * 1 0.000006896000000 seconds
malloc, 100 MiB 0.000007790000000 seconds
calloc, 100 MiB * 1, followed by reading first byte 0.000012331000000 seconds
calloc, 10 MiB * 10 0.000024079000000 seconds
calloc, 10 MiB * 10, followed by reading first byte 0.000031266000000 seconds
malloc followed by bzero 2.252493061000000 seconds

Ouch! Two-and-a-quarter seconds for bzero. A second run returned saner numbers:

Second run
Method Time
calloc, 100 MiB * 1 0.000007140000000 seconds
malloc, 100 MiB 0.000007317000000 seconds
calloc, 10 MiB * 10 0.000008956000000 seconds
calloc, 100 MiB * 1, followed by reading first byte 0.000012812000000 seconds
calloc, 10 MiB * 10, followed by reading first byte 0.000031807000000 seconds
malloc followed by bzero 0.138714770000000 seconds

bzero has greatly improved, but it still loses badly, taking more than a tenth of a second.

Lesson: Always use calloc when you need zeroed bytes. Don’t try to be lazy by zeroing them later — calloc is much better at that.

If you want, you can play with calloctiming yourself. If you want to post the results in the comments, use the included sed script to convert the output to an HTML table: ./calloctiming | sort -n -t $':\t' +1 | sed -f output_to_html.sed.

* Another way that I didn’t think of until just now is QA1398‘s recommendation of mach_absolute_time** and AbsoluteToNanoseconds. I tried that too and the output is no different from the Mach clock.

** And when I went looking for documentation of mach_absolute_time to link to, I found this instead: Kernel Programming Guide: Using Kernel Time Abstractions. This information is for those writing kernel extensions; it recommends using clock_get_system_nanotime. Won’t work in user space, unfortunately.