str type has a
translate method that, given a second string representing a translation table, returns a new string in which characters in the first string are looked up at their ordinal positions in the translation table and replaced with the characters found at those positions.
The identity translation table, performing no changes, is
table[i] = i. For example,
'!', so exclamation marks are not changed. If you made a table where
'.', exclamation marks would be changed to periods (full stops).
I’d like to see implementations of a program that does that, with the input string encoded in UTF-16 and the translation table encoded in UTF-32 (a 0×11000-element long array of UTF-32 characters), with the table initialized to its identity:
table[i] = i.
And yes, you need to handle surrogate pairs correctly.
Some languages that I would particularly like to see this implemented in include:
- A state-machine language (I don’t know of any off-hand; this might be their time to shine)
I know how I would do this in C, and I’m sure I could bash something out in Python, but how would you do this in your favorite language?
As a test case, you could replace “ and ” (U+201C and U+201D) with « and » (U+00AB and U+00BB).
If you want to post code in the comments,
<pre>…</pre> should work. Alternatively, you can use Gist.
* I’m using the C sense of
'!' here. In Python, this would be
table[ord('!')], since characters in Python are just strings of length 1, and you can’t index into a string with another string;
ord is a function that returns the ordinal (code-point) value of the character in such a string. ↶