[Rd] sort yields different results on OS X (PR#14163)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Dec 22 14:37:24 CET 2009
On Tue, 22 Dec 2009, Peter Dalgaard wrote:
> Prof Brian Ripley wrote:
>
>>
>> That different OSes use the same name for a locale does not make them the
>> same locale.
>>
>> Note that R can be compiled to use ICU, which provides a well-considered
>> collation suite. R on Mac OS X uses ICU, as does a Linux build if it is
>> available -- so I would say that it is RHEL that is out of line here (it
>> makes little sense to have < and > far apart in the collation sequence).
>>
>
> That's not it:
>
>> v <- c("1","<0","<3","2")
>> sort(v)
> [1] "<0" "1" "2" "<3"
>
> The point is rather that "special characters" are ignored during collation.
Sometimes ....
> Apparently, this comes from /usr/share/i18n/locales/iso14651_t1_common on
> Fedora; I wouldn't know how faithful to the ISO standard that is.
ISO 14651 is a version of the Unicode Collation Algorithm
(http://www.unicode.org/reports/tr10/) which ICU uses. So other
people have implemented the same set of rules to give different
results -- which is quite possible given the number of non-prescribed
choices that need to be made.
We've seen too many anomalies from glibc to trust it: which is why ICU
is used if available.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list