[R-SIG-Mac] Solution to collation problems on Mac OS X
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Dec 28 08:52:57 CET 2008
Some of you will be aware that R ignores locale when collating strings on
Mac OS X: this arises from its inadequate FreeBSD-based wcscoll, whose man
page says
BUGS
The current implementation of wcscoll() only works in single-byte
LC_CTYPE locales, and falls back to using wcscmp() in locales with
extended character sets.
(and conventional Mac OS X locales are not 'single-byte' but UTF-8).
Apple ships a modified version of ICU (IInternational Components for
Unicode) for collation in its ObjC classes, and with Simon's help I have
added code to allow R to use this on Tiger and Leopard. This is now the
default in R-devel, and available in R-patched by configuring R with
--with-ICU.
This originally came up for European Spanish, so in the es_ES locale:
> example(Comparison)
...
mprsn> ## by number
Cmprsn> writeLines(strwrap(paste(x, collapse=" "), width = 60))
! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \
] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z
{ | } ~ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹
º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö ×
Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ
ö ÷ ø ù ú û ü ý þ ÿ
Cmprsn> ## by locale collation
Cmprsn> writeLines(strwrap(paste(sort(x), collapse=" "), width = 60))
` ´ ^ ¯ ¨ ¸ _ - , ; : ! ¡ ? ¿ . · ' " « » ( ) [ ] { } §
¶ © ® @ * / \ & # % ° + ± ÷ × < = > ¬ | ¦ ~ ¤ ¢ $ £ ¥ 0 1 ¹
½ ¼ 2 ² 3 ³ ¾ 4 5 6 7 8 9 a A ª á Á à À â Â å Å ä Ä ã Ã æ Æ
b B c C ç Ç d D ð Ð e E é É è È ê Ê ë Ë f F g G h H i I í Í
ì Ì î Î ï Ï j J k K l L m M n N ñ Ñ o O º ó Ó ò Ò ô Ô ö Ö õ
Õ ø Ø p P q Q r R s S ß t T u U ú Ú ù Ù û Û ü Ü v V w W x X
y Y ý Ý ÿ z Z þ Þ µ
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-SIG-Mac
mailing list