Archive for the 'C' Category

Pointers tutorial 1.3

Saturday, January 16th, 2010

At long last, a new version of Everything you need to know about pointers.

The most significant changes are long-overdue corrections regarding declarations of const pointers and the difference between arrays and pointers. You can—and, if you learned how to work with pointers from this tutorial, should—read all of the changes in the delta between 1.2 and 1.3.

Warnings I turn on, and why

Saturday, November 7th, 2009

I've started turning on most of Xcode's warning options and one warning-related build setting in all of my personal projects. I suggest you do the same.

There are some warnings I don't turn on, for any of several reasons:

  • They're inapplicable. For example, “‘Effective C++’ Violations” doesn't apply to me, because I don't write C++ code.
  • They don't help anything. An example is “Four Character Literals”, which is about 'abcd' literals for four-byte-code types such as OSType. These sacrifice something convenient for no benefit, so I leave them off.
  • They are impossible for me to fix. An example is “Multiple Definition Types for Selector”. Cocoa raises that one all over its headers, and I can't do anything about Cocoa.

The rest of the warnings, I turn on because either they make something clearer or they tell me about either real or potential (i.e., future real) bugs. They are:

  • Check Switch Statements

    Warn whenever a switch statement has an index of enumeral type and lacks a case for one or more of the named codes of that enumeration. The presence of a default label prevents this warning.

    Leave no case unhandled.

    Consider whether a default: label is appropriate for your enumeration. If your switch statement handles all possible values, cut out the default and assert that the value is one of the possible ones instead. An easy way to do this, if the enumeration values are serial and the enumeration is not part of an API you expose, is to have one extra name defined as the number of possible values:

    enum Foo {
    	kFooFoo,
    	kFooBar,
    	kFooBaz,
    	kFooNumberOfValidFooValues
    };
    

    Then, in your assertion macro, compare the value against that name:

    #define MyParameterAssertValidFoo(foo) \
    	NSAssert1((foo) < kFooNumberOfValidFooValues, @"Invalid Foo value: %d", (foo));

    When you add kFooQux, insert it above kFooNumberOfValidFooValues, and the value of kFooNumberOfValidFooValues will increase by one to fit it.

    The result is that your switch statement covers all known values for the enumeration (or you get a warning because it doesn't), and your method throws an exception (from the assertion) whenever anyone passes an unknown value.

  • Hidden Local Variables

    Warn whenever a local variable shadows another local variable, parameter or global variable or whenever a built-in function is shadowed.

    One common way to get this warning is to name a variable index, because there is a function by that name in the standard C library. That's not as much of a false positive as you may think: If you fail to declare your index variable, all your references to it will actually refer to the index function. You can see how it would be bad to send a message such as [myArray objectAtIndex:index] with this bug.

    The solution is simple: Never, ever name a variable index.

  • Implicit Conversion to 32 Bit Type

    Warn if a value is implicitly converted from a 64 bit type to a 32 bit type.

    This is most useful when converting old code to work correctly in a 64-bit architecture. Storing a pointer into an int variable (such as a reference constant) when targeting an LP64 architecture is a good way to get this warning, and rightly so.

  • Initializer Not Fully Bracketed

    Example, Here initializer for a is not fully bracketed, but that for b is fully bracketed.

    	int a[2][2] = { 0, 1, 2, 3 };
    	int b[2][2] = { { 0, 1 }, { 2, 3 } };

    This is a cleanliness warning. It also applies to structures, such as NSRect:

    NSRect warns = { 0.0f, 0.0f, 640.0f, 480.0f };
    NSRect doesNotWarn = { { 0.0f, 0.0f }, { 640.0f, 480.0f } };

    (In real code, I'm more likely to use NSZeroPoint instead of the { 0.0f, 0.0f } element above. It's harder to spell that wrong and get away with it than it is to get away with typing 9.9f, 1.1f, or 2.2f instead of 0.0f.)

  • Mismatched Return Type

    Causes warnings to be emitted when a function with a defined return type (not void) contains a return statement without a return-value. Also emits a warning when a function is defined without specifying a return type.

  • Missing Braces and Parentheses

    Warn if parentheses are omitted in certain contexts, such as when there is an assignment in a context where a truth value is expected, or when operators are nested whose precedence people often get confused about.

    Also warn about constructions where there may be confusion to which if statement an else branch belongs. Here is an example of such a case:

    	if (a)
    		if (b)
    			foo ();
    	else
    		bar ();

    In C, every else branch belongs to the innermost possible if statement, which in this example is if (b). This is often not what the programmer expected, as illustrated in the above example by indentation the programmer chose.

    This may appear to be just a cleanliness warning, but as you can see from the example, it can also warn you about code that may not flow the way you expect it to.

  • Missing Fields in Structure Initializers

    Warn if a structure's initializer has some fields missing. For example, the following code would cause such a warning, because "x.h" is implicitly zero:

        struct s { int f, g, h; };
        struct s x = { 3, 4 };

    This option does not warn about designated initializers, so the following modification would not trigger a warning:

        struct s { int f, g, h; };
        struct s x = { .f = 3, .g = 4 };

    I'm not sure why it warns about the former and not the latter, since all the members get initialized in both code examples (C99 §6.7.8 ¶21). If nothing else, this warning is good motivation for you to switch to designated initializers, which make your code more explicit about which members it's initializing.

  • Missing Newline At End Of File

    Another cleanliness warning—this one, about the cleanliness of diffs.

  • Sign Comparison

    Warn when a comparison between signed and unsigned values could produce an incorrect result when the signed value is converted to unsigned.

  • Strict Selector Matching

    Warn if multiple methods with differing argument and/or return types are found for a given selector when attempting to send a message using this selector to a receiver of type "id" or "Class". When this setting is disabled, the compiler will omit such warnings if any differences found are confined to types which share the same size and alignment.

    I don't turn this one on, because it's unnecessary. When the multiple declarations differ significantly (e.g., one method returns an object and the other returns a float), the compiler will raise the warning whether it's turned on or not. When the declarations don't differ significantly (e.g., both methods return an object), the difference won't cause a problem, so you don't need to worry about it.

    So, you should leave this one off.

  • Typecheck Calls to printf/scanf

    Check calls to printf and scanf , etc, to make sure that the arguments supplied have types appropriate to the format string specified, and that the conversions specified in the format string make sense.

    The biggest reason to turn this on is that it checks your use of methods that take a nil-terminated list of arguments:

    NSArray *array = [NSArray arrayWithObjects:@"foo", @"bar"];

    That message should have a nil after the last argument. With this warning turned on, the compiler will point out that I don't.

    The ostensible main reason to turn this on is to have the compiler check your uses of printf and scanf formats. I don't use printf often (and I never use scanf), so that's not so important for me, but when I do, this could come in handy.

    Sadly, it doesn't work on NSLog calls.

  • Undeclared Selector

    Warn if a "@selector(...)" expression referring to an undeclared selector is found. A selector is considered undeclared if no method with that name has been declared before the "@selector(...)" expression, either explicitly in an @interface or @protocol declaration, or implicitly in an @implementation section.

    Another benefit of this warning is that you can use it to get a warning when you pass a wrong key to a KVC, KVO, KVV, or Bindings method. Uli Kusterer has a macro for that.

  • Unused Functions

    Warn whenever a static function is declared but not defined or a non-inline static function is unused.

    Works best with a policy of declaring any function as static that you don't need to be visible elsewhere in your program.

  • Unused Labels

  • Unused Values

  • Unused Variables

    These follow the general rule of “code you don't have is code you don't have to debug”. If you're not using a label, expression statement, or variable, you don't need it, and you will find your code clearer without it.

    You may notice that I don't turn on Unused Parameters. Most times when I trip that warning, it's a callback function or method, so I can't get rid of the argument. Rather than litter my code with bright yellow #pragma unused(foo) directives, I prefer to just turn this one off. (See my rule above about less code being better.)

Once I have turned on all of these warnings and then eradicated them from my code, I turn on two more build settings:

  • Treat Warnings as Errors

    I call this “hardass mode”.

    Remember what I said above: Almost all of these warnings represent real or potential (i.e., future) bugs in my program. Rather than tolerate them, I turn this setting on so that any time I write such a bug, I break the build.

    I haven't been able to turn this on yet in Adium or Growl, although I have turned it on in Adium's Spotlight-importer project. I do, however, turn it on in all of my solo projects.

  • Run Static Analyzer

    Activating this setting will cause Xcode to run the Clang static analysis tool on qualifying source files.

    The Clang Static Analyzer is the find-your-bugs-for-you tool you've heard so much about. This setting makes Xcode run it whenever you build. Thus, every build, you get all those warnings errors and your analysis results.

    Whenever possible, I leave this on; if there's a source file that it takes a long time to analyze (e.g., GrowlPluginController.m), then I turn it off, but only then.

UPDATE 2009-11-22: Jonathan “Wolf” Rentzsch wrote a script to turn on all of these settings in all of the projects you have open.

UPDATE 2009-11-28: Updated the entry on “Typecheck Calls to printf/scanf” after seeing that Jeremy W. Sherman pointed out a much better benefit of it in a comment on a Stack Overflow answer.

UPDATE 2009-12-05: Corrected the discussion of the index problem. You can't use index, or any other function, as a C-array subscript, so the problem only affects higher-level arrays, such as NSArray.

The peril of index(3)

Thursday, November 5th, 2009

This is mainly for Andy Finnell on Twitter, who wonders why some of us avoid naming variables index.

I pointed out that there is a function in standard C named index, and this causes one of two problems: If you declare a variable named index, you have shadowed the function and should get a warning for that; if you fail to declare the variable, you pass the pointer to the index function as your array index, which is probably not what you intended.

I say “should” there because, as he noted in his response, the shadowed-name warning is off by default. You should turn it on, because it catches bugs. In fact, the index bug is one that it can prevent.

Suppose you do name a variable index, and either you don't have the shadowed-name warning turned on or you ignore it. You initialize the variable with an index, but don't otherwise assign to it. Then, you attempt to access an object in an array by this index.

All well and good so far. index is a variable, so everything works as intended.

But then, one of several things happens:

  1. You comment out both the declaration and the usage of index, for whatever reason, but then you uncomment the usage but forget to uncomment the declaration.
  2. You update and/or merge in your version-control system, or otherwise apply one or more diffs. Usually, this works, but today isn't your lucky day: The merge breaks your source code. Perhaps it introduces conflicts, and you resolve them incorrectly. Or maybe it breaks the code silently (e.g., by merging in another branch's division of this function into two).
  3. You move the code to another location, but you forget to move half of it, or you move one half and delete another, forgetting that the declaration of index was in the code you deleted.

You had a variable named index, but now you don't—but the index function is always there*. Since there is something named index, your code compiles. It's the wrong type, so you'll get a warning, but maybe you don't notice it.

Then you run the code and it crashes. Why? Because you passed a function as the index into an array.

In the worst possible case, it was #2 and you weren't aware that this code was affected. Maybe you'd been working on something else. Anyway, since you hadn't been working on the now-broken code, you aren't testing it**, so you don't know that it's now broken.

So you ship it. You ship this index-way-out-of-range crasher. And then your user runs the code and gets the crash.

This isn't theoretical. I've had this happen more than once (fortunately, not in the hands of a user). It's one reason why I turn on the shadowed-name warning and “Treat Warnings as Errors”, and it's the reason why I never use index as a variable name.

UPDATE 2009-12-05: To clarify, this problem does not affect C arrays, as C does not allow you to use a pointer in an array subscript. It mainly affects higher-level array interfaces, such as Cocoa's NSArray.

* Assuming that, directly or indirectly, you've included string.h. If you're using Cocoa, you have (Core Foundation includes it).

** Unless, of course, you have automated test cases covering this code, and you run those.

Variables for clarification

Friday, August 14th, 2009

Before:

if (![newTrackURL isEqualToString:trackURL] || ([newTrackURL hasPrefix:@"http://"] && !([lastPostedDescription isEqualToString:displayString]))) { // this is different from previous notification, or it's a stream and the playing track has changed

After:

BOOL URLChanged = ![trackURL isEqualToString:newTrackURL];
BOOL isStream = [newTrackURL hasPrefix:@"http://"];
BOOL descriptionChanged = ![lastPostedDescription isEqualToString:displayString];
if (URLChanged || (isStream && descriptionChanged)) {

Why remove the comment? Well, I had just changed the code to what you see in the Before example, but I had not changed the comment, and I thought I was ready to commit. I almost committed a stale comment. It's that easy. Luckily, I caught it in reading the patch before I qfinished it, so I was able to change the comment to what you see above.

Now I have no comment at all, but the code is clear, so I don't need one. This is an improvement.

Why the compiler won’t let you declare a variable immediately after a case

Friday, February 20th, 2009

Consider this code:

enum { foo, bar, baz } my_var;
switch (my_var) {
    case foo:
        int foo_alpha; //Line 7
        int foo_beta;
        break;

    case bar:
        my_var = baz;
    case baz:
        printf("Bar or baz encountered\n");
        break;
}

Try to compile it as C99.

Here's what GCC says:

test.c: In function ‘main’:
test.c:7: error: syntax error before ‘int’

What!

First off, why not? What is invalid about this declaration statement?

Second, why is foo_alpha invalid and not foo_beta?

There are several definitions in the C99 specification that come together to cause this problem.

The first is that there is no such thing as a declaration statement, because declarations are not statements. See §6.7 and §6.8; note that neither definition includes the other. In the language that is C99, declarations and statements are separate concepts.

The second is the definition of a compound statement. The definition of a switch statement (which is part of §6.8.4) is:

switch ( expression ) statement

If you go back up to §6.8, you'll see that another possible kind of statement is a compound statement, for which §6.8.2 gives this definition:

compound-statement:
  • {   block-item-listopt   }
block-item-list:
  • block-item
  • block-item-list   block-item
block-item:
  • declaration
  • statement

So a declaration is not a statement, a compound statement can contain declarations and/or statements, and a switch statement is a prefix upon (usually) a compound statement.

Now, the kicker. Read the relevant definition of a labeled statement from §6.8.1:

case   constant-expression   :   statement

Statement. Not block-item. Not declaration. Statements only.

So this is what the compiler sees in valid code (with a declaration not following a case label):

  • selection statement (switch)
    • compound statement
      • labeled statement
        • statement
      • statement
      • statement
      • declaration
      • statement
      • jump statement (break)
      • labeled statement
      • statement
      • jump statement (break)
      • labeled statement
      • statement
      • jump statement (break)

Now, consider how the compiler sees my code above:

  • selection statement (switch)
    • compound statement
      • labeled statement
        • declaration — wait, this isn't a statement! ERROR
      • declaration — also a kind of block-item, so it's perfectly valid here
      • jump statement (break)
      • labeled statement
      • statement
      • labeled statement
      • statement
      • jump statement (break)

I hope this makes clear that this isn't a compiler bug; the C99 language really does work this way.

(One possible solution would be to make a declaration a kind of statement, but I don't know what other ramifications that might have. [UPDATE 11:36: Jeff Johnson tells us why not.])

Manpage Monday: backtrace(3)

Monday, October 27th, 2008

New series: Manpage FridayMonday. Every Fridayother Monday, I will post a link to one manpage that comes with Mac OS X. [See update below.]

Today, it's backtrace(3), which tells you about three functions:

SYNOPSIS

#include <execinfo.h>

int backtrace(void** array, int size);

char** backtrace_symbols(void* const* array, int size);

void backtrace_symbols_fd(void* const* array, int size, int fd);

DESCRIPTION

These routines provide a mechanism to examine the current thread’s call stack.

backtrace() writes the function return addresses of the current call stack to the array of pointers referenced by array. At most, size pointers are written. The number of pointers actually written to array is returned.

backtrace_symbols() attempts to transform a call stack obtained by backtrace() into an array of human-readable strings using dladdr(). The array of strings returned has size elements. It is allocated using malloc() and should be released using free(). There is no need to free the individual strings in the array.

backtrace_symbols_fd() performs the same operation as backtrace_symbols(), but the resulting strings are immediately written to the file descriptor fd, and are not returned.

Added 2008-10-25: Here's a test app to show the output.

File: print_backtrace.tbz

Output:

0   print_backtrace                     0x00001fc9 print_backtrace + 31
1   print_backtrace                     0x00001ff6 main + 11
2   print_backtrace                     0x00001f7e start + 54

In order:

  1. Index in the array of addresses
  2. Executable name
  3. Address
  4. Name + offset

UPDATE 2008-10-24 21:19 PDT: I've decided to change the schedule on this. Instead of Manpage Friday, I'll do Manpage Monday, and it will be every two weeks. In between will be Framework Friday.

So this is now the first Manpage Monday post, and I will update its post date on Monday, the 27th. (I can't update it now because WordPress won't let me publish a post from the future—only schedule it. Grrr.) The week after that, November 7th will be the first Framework Friday. And in the week after that, November 10th will be the second Manpage Monday.

A simple way to make your NSLogs and NSAsserts more informative

Wednesday, June 13th, 2007

OK, so I'm not totally radio-silent. I learned about this in a WWDC session, but since it's already public API in Tiger (actually, it's a GCC extension), I can talk about it.

It's a built-in macro called __PRETTY_FUNCTION__. This is a fully-qualified human-readable signature of the function you're in. The GCC docs don't mention this part, but it even works in Objective-C, in addition to C++ and plain C. Here's a test app, containing this code:

@implementation Blah(blah)
- (void)blah {
    NSLog(@"%s", __PRETTY_FUNCTION__);
}
@end

And the output:

2007-06-14 07:50:37.733 printmethod[1800] -[Blah(blah) blah]

Notice that it includes the class name, category name (if any), and method selector.

Note that that's a C-string, not an NSCFString. Be sure to set up your format string accordingly.

How do I swap thy bytes? Let me count the ways

Saturday, April 28th, 2007
  1. swab

    swab(3) is a function that copies some bytes from one location to another, swapping each pair of bytes during the copy. Handy for structures.

    It has a feature that isn't mentioned in the Darwin manpage for swab: If you pass a negative size, it does not swap. I have no idea why this magic behavior was added; if you want a swab that doesn't swap bytes, just use bcopy. I shake my head at this use of a magic argument.

  2. ntohs, htons, ntohl, htonl

    These four functions swap the bytes of a 16-bit (‘s’) or 32-bit (‘l’, in ignorance of LP64) integer and return the transformed value.

    They are mainly used in network-I/O contexts, as they transform between network byte order (big-endian) and host byte order (whatever you're running). But there's nothing stopping you from using them for any other 16-bit/32-bit integral byte-swapping.

  3. OSByteOrder (Darwin)

    The Darwin kernel provides a number of handy-dandy macros for byte-swapping:

    • OSSwap{Const}?Int{16,32,64}
    • OSSwap{Host,Big,Little}To{Host,Big,Little}{Const}?Int{16,32,64}

    The {Host,Big,Little}To{Host,Big,Little} functions swap conditionally; the others always swap.

    According to the Universal Binary Programming Guidelines, it is safe to use these in applications.

  4. Core Foundation

    CF's Byte-Order Utilities provide the same facilities as OSByteOrder, with a couple of twists:

    • The implementation uses assembly language when the environment is GCC on either PowerPC or x86. This is theoretically faster than OSByteOrder's pure-C implementation. (CF falls back on pure C in all other environments.)
    • CF adds support for byte-swapping 32-bit and 64-bit floating-point numbers.
  5. Foundation

    Foundation's byte-order functions bear all the same capabilities as the CF Byte-Order Utilities. In fact, they are implemented with them.

  6. NeXT byte-order utilities

    These utilities are equivalent to the Foundation functions, except that they are implemented using the OSByteOrder utilities. They are declared in <architecture/byte_order.h>.

  7. Core Endian

    Core Endian logo that I made up.

    I think that the “Core Endian” name itself is new in Panther. Three functions in the API have a “CoreEndian” prefix, and are marked as new in Panther, whereas the others have simply “Endian”, and are marked as having existed since 10.0. This suggests to me that the entire API was branded “Core Endian” in 10.3, with the older functions subsumed by it.

    The new functions have to do with “flipper” callbacks, which you can install so that things like Apple Event Manager can DTRT with your custom data types. The older functions are plain byte-swapping utilities, just like all the other APIs described here, and exist mainly for the benefit of QuickTime users (they exist on Windows, too, through QuickTime).

Guess the bug!

Saturday, March 31st, 2007

UPDATE: NO BUG! Serves me right for not testing a programming challenge before posting it. Thanks, Evan.

a = 42, b = 100;

Before you type “error on line 1”: No, it's not a compilation error. The above code is legal, just wrong. The task before you is to explain how. ☺

Please do not use %x for pointers

Tuesday, January 23rd, 2007

I often see code like this:

printf("Address of foo: %x\n", &foo);

The intent here is to print the address in hexadecimal format. Good plan; bad execution.

First, here's one possible output:

Address of foo: 123456

Is this decimal (%u), hex (%x), or octal (%o)? I can't tell from the output; for all I know, the person who wrote that line is a real newbie and used %i or %d. (That would be even worse for really high addresses, as they will then be formatted as negative numbers.)

So the author changes the line to:

printf("Address of foo: 0x%x\n", &foo);

Now the output has the 0x prefix, making clear that it's hex, but there's still a bug here. The bug is that %x is not the correct formatter for pointers.

The type expected by %x is unsigned int. For the past decade or so, this has not been a problem because on all PCs, including Macs, the size of a pointer has been equal to the size of an int.

But over the past couple years and the next couple years, there's a transition underway to LP64, wherein long ints and pointers are 64-bit. Plain ints won't be; they'll still be 32 bits. This means that you'll get funky results, and possibly crashes (with %s, %n, and %@), if you're using %x for your pointers.

You could use %lx (unsigned long int), but it's still the wrong type. There is a formatter specifically for pointers: %p. It even provides the “0x” prefix for you.

printf("Address of foo: %p\n", &foo);

Address of foo: 0x1e240

So, in order to make your code both more stable and more future-proof, please use %p, not %x or %lx, for formatting your pointers.

Why you should use #pragma mark

Saturday, January 20th, 2007

It's a minute and a half, and it's 200 K, and you'll need QuickTime 7 to watch it.


pragma mark at work in Xcode's editor.

Useful as #pragma mark is, I wouldn't recommend it for cross-platform code. GCC's documentation of #pragma mark seems to suggest that it's Darwin-only.

On initializing static variables

Thursday, January 4th, 2007

Did you know that you don't have to initialize static variables? (If you've done any Cocoa programming, you know that statics are commonly treated as class ivars, most commonly to hold singleton instances.)

Quoth C99 (§6.7.8 paragraph 10):

… If an object that has static storage duration is not initialized explicitly, then:

  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate [array or structure —PRH], every member is initialized (recursively) according to these rules;
  • if it is a union, the first named member is initialized (recursively) according to these rules.

And I've prepared a test app that shows that gcc 4.0.1 on OS X 10.4.8 does seem to comply with this handy part of the standard.

Happy not-initializing!

Wanted for abuse of operators

Saturday, December 2nd, 2006

From the stock app delegate in Apple's Core Data application template:

[managedObjectContext release], managedObjectContext = nil;
[persistentStoreCoordinator release], persistentStoreCoordinator = nil;
[managedObjectModel release], managedObjectModel = nil;

And in case you were wondering how Apple could possibly ship code that doesn't even compile in the app templates…

… that's legal code.

Yes, there is a comma operator, and that's what's at work there. There is a warning (-release returns void, whereas assignment of nil returns id, so the types are mismatched, which GCC doesn't like), but the code is legal.

Legal doesn't always mean correct, however. The comma operator is supposed to be used for things like this:

printf("%s\n", (-‍-argc, *++argv));

Though in modern usage it is out of favor, replaced on sight with less obfuscative code:

printf("%s\n", *++argv);
-‍-argc;

printf("%s\n", argv[idx++]);

It is not for performing a void function call followed by an assignment. A void expression should never appear in a comma expression. The code would be greatly improved by replacing the comma with a semicolon, dividing each statement into the two statements that it should be.

Fun with printf

Friday, November 17th, 2006

fprintf(usageOutputFile, "… (200 pixels across, 50% of height).\n");

Output:

… (200 pixels across, 5027777772734f height).

Guess the bug!

I feel good about Mac open source

Monday, October 2nd, 2006

(I've had this brewing in a VoodooPad page for some months now, and have finally gotten around to implementing it.)

I feel good about Mac open source.

Many of my fellow Mac programmers have made reusable source code available under free licenses, such that any other programmer can use it in their applications. This abundant generosity impresses me, and at this point has moved me to build a catalog of this code.

It's organized by author name and by program name. It's not just applications, though; plain classes are also listed, as are useful-looking images (such as my own plus/minus button images).

Enjoy.

UPDATE 19:07: This is actually a duplicate post of an earlier draft. I'll see about moving the comments over to the correct post. In the meantime, if you have any further comments, please post them over there.

UPDATE 19:43: Done. I ended up just renaming this post (dropped “-2” from the post slug; you'll need to update any links to this post) and removing the duplicate post.

Making GCC use proper quote marks

Tuesday, September 19th, 2006

When you build a program in Xcode, you may have noticed that error messages from GCC look like this:

error.c: In function `main':
error.c:2: warning: implicit declaration of function `pirntf'

This shouldn't happen on a modern operating system with modern text capabilities. Fortunately, it is easy enough to make it do the Right Thing, which is to use Unicode quote marks.

First, figure out the correct ISO 639-1 language code for your preferred language. I use English, so mine is “en”. The Library of Congress has a list of ISO 639-1 language codes. In addition, you may want to append a region code; I use American English, so mine is “US”. These should be separated by an underscore; my full language specifier, then, is “en_US”.

Then, append “.UTF-8” to this (= “en_US.UTF-8”), and set it in your LC_ALL environment variable. You can do this by adding this variable to $HOME/.MacOSX/environment.plist. If you don't already have one, you can create it with Property List Editor; you will need to move it to the proper location with Terminal. Either way, you will then need to logout and login.

GCC will then use nice Unicode quote marks in its output:

error.c: In function ‘main’:
error.c:2: warning: implicit declaration of function ‘pirntf’

There's extra work to do if you also invoke builds from Terminal or xterm (whether you use xcodebuild, make, or gcc directly).

Terminal

  1. Right-click on any Terminal window and choose “Window Settings…”.
  2. Switch to the “Display” pane.
  3. Set the character set encoding to UTF-8.
  4. Turn off “wide glyphs for Japanese/Chinese/etc.”.
  5. Click “Use Settings as Defaults”.

xterm

  1. In your .Xdefaults file, add these lines:

    xterm*font:        -*-clean-medium-r-*-*-12-*-*-*-*-*-iso10646-*
    xterm*boldFont:    -*-clean-bold-r-*-*-12-*-*-*-*-*-*-*
    xterm*utf8:        1

    Note that you can specify any font for the two font values; however, “clean”'s Unicode version only exists in plain, not boldface.

Parsing the preprocessor

Thursday, June 15th, 2006

If you've ever run GCC's preprocessor alone and looked at its output, you've seen lines like these:

# 1 "/usr/include/sys/types.h"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "/usr/include/sys/types.h"
# 66 "/usr/include/sys/types.h"
# 1 "/usr/include/sys/appleapiopts.h" 1 3 4
# 67 "/usr/include/sys/types.h" 2


# 1 "/usr/include/sys/cdefs.h" 1 3 4
# 70 "/usr/include/sys/types.h" 2

And you probably wondered what all that means. Here's your secret decoder ring.

First, these are called "line markers" in libcpp. The format of a line marker is:

  1. A line number
  2. The path to the relevant file
  3. Flags

The flag values are:

1
Push (enter) header
2
Pop (leave) header
3
This is a system header (determined by these rules with this modification)
4
Requires extern "C" protection (determined by the same rules as above); never found without 3

Note that a pop applies to the header above (in the include stack) the one referenced in the marker.

Example:

# 66 "/usr/include/sys/types.h"
# 1 "/usr/include/sys/appleapiopts.h" 1 3 4
# 67 "/usr/include/sys/types.h" 2
  1. Fast-forward to line 66 of <sys/types.h> (nothing interesting occurs before this line).
  2. Enter <sys/appleapiopts.h>. Everything from this point until the next marker is from that header. Note that this header is a system header (3) and requires extern "C" protection (4).
  3. As it turns out, nothing interesting happened there. So the very next line is a pop marker: <sys/appleapiopts.h> is popped, so now we're back in <sys/types.h>, now on line 67 (the line after the #include <sys/appleapiopts.h>).

The relevant code in libcpp is in directives.c. The function that parses line markers (presumably used by the compiler rather than the preprocessor itself; the preprocessor generates them) is do_linemarker. Additional include-related code is in files.c.

UPDATE 23:24 PDT: Beware of pragmas. Seems obvious now, but I didn't think of it earlier: The preprocessor leaves #pragma directives untouched, being that they're for the compiler rather than the preprocessor. So if you're only looking for line markers, you may get tripped up if you don't properly handle/ignore a pragma.

Technorati tags: , , .

Within epsilon of perfect

Monday, March 27th, 2006

(Title taken from my own assessment of my spelling.)

It's been known for awhile that rather than directly comparing two floating-point numbers, you should instead subtract the larger from the smaller and compare the difference to some epsilon value. The reason for this is that two numbers might be very very similar, but not exactly equal. “Epsilon” in a mathematical sense means “minimum precision that you care about”. The epsilon value for money, for example, is usually 0.01 — differences lower than this are thrown away.

So of course I went scrounging in the headers, found macros named FLT_EPSILON, DBL_EPSILON, LDBL_EPSILON, and recommended to all my programmer friends that they use these constants for comparisons of floating-point values rather than == and !=.

From time to time, facts just float up to the top of my head for no obvious reason. I have a sheet taped to my wall called “Word of the Day”; when a word pops into my head like this, completely unrelated to any previous thoughts, I write it down on that sheet to look up later. I consider this a more advanced (if slow) form of self-education. They might be long-forgotten memories, or something else; I don't know, I just write them down and look them up.

About half an hour ago, this happened to me again. Except this time, the thought was definitely a memory, of something I'd read in float.h:

/* The difference between 1 and the least value greater than 1 that is
   representable in the given floating point type, b**1-p.  */

Another thought had bubbled up with this, and it was an epiphany: Technically, this means that the expression x != (x + FOO_EPSILON) should evaluate to 1. In other words, subtracting from FOO_EPSILON isn't necessary.

So, as is my wont, I wrote a test app. Sure enough, that expression does evaluate to 1.

So forget what I said. x != y is directly equivalent to comparison against FOO_EPSILON, and it's easier to read, too. So just use that.

Technorati tags: , .

Pointer talk 1.2

Thursday, January 19th, 2006

the new version of the pointer talk is live.

Declarators are fun! Part II

Sunday, January 8th, 2006

One more:

6.2.5[28]:

EXAMPLE 2 The type designated as “struct tag (*[5])(float)” has type “array of pointer to function returning struct tag”. The array has length five and the function has a single parameter of type float. Its type category is array.