Report-an-Apple-Bug Friday! 3 (repost)

2006-02-04 05:30:00 -08:00

fie on Blogger. this post isn’t visible from the front page or the Dashboard, even though it exists. so I’m reposting it.

UPDATE 2006-02-25: it has been closed as a duplicate.


C99 escapes in Obj-C string contain UTF-8 interpreted as an 8-bit encoding. as last time, edited only for HTMLification.

Summary:

since C99, C has a \u escape for Unicode characters. for example, a snowman is \u2603.

when used in an Obj-C string literal (@"foo"), however, these escapes are broken.

Steps to Reproduce:

  1. write a program that displays a string, created from an Obj-C string literal containing a Unicode escape.
  2. compile it.
  3. run it.
  4. observe the display of the string.

Expected Results:

the Unicode character is displayed as such.

Actual Results:

the Unicode character’s UTF-8 representation is displayed in some 8-bit encoding (possibly ISO 8859-1).

Regression:

none known.

Notes:

the bug only occurs in Obj-C string literals, not plain C string literals.

it appears that the compiler uses UTF-8 for internal storage, which works. NSConstantString, however, seems to expect ISO 8859-1, and interpret its input as such.

the enclosed tarball contains test programs (command-line) in plain C (using printf) and Obj-C (using NSLog). the plain C version works; the Obj-C does not.

I found the bug when displaying an Obj-C constant string (which had been passed through NSLocalizedString, but there is no matching localisation for it yet) as an NSMenuItem‘s title. so the problem is not specific to terminals or Terminal’s display, nor dependent on the value of any locale environment variables.


I also attached two test cases.

Leave a Reply

Do not delete the second sentence.