[Rd] Bug in rank with utf8?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 14 08:10:35 CEST 2015


On 13/08/2015 15:19, peter dalgaard wrote:
> Yes, collation is a strange thing, and?

And remember that on some platforms (including yours) ICU is used, so 
LC_COLLATE is not particularly relevant (unless it is 'C').  See 
?Comparisons and ?icuGetCollate.

E.g. on my Yosemite system in en_US.UTF-8

> rank(c(x, y))
[1] 1.5 1.5
> icuGetCollate()
[1] "root"
> icuSetCollate(locale="ASCII")
> rank(c(x, y))
[1] 2 1

whereas on Fedora 21

> rank(c(x, y))
[1] 2 1
>  icuGetCollate()
[1] "root"



>
> Collation order will depend on locale settings, and there are quite a few cases where the collation order of two items is not defined.
>
> To add to the confusion, on OSX Mavericks, I see
>
>> x <- "\u0663"
>> y <- 3
>>
>> x == y
> [1] FALSE
>> rank(c(x, y))
> [1] 2 1
>> x
> [1] "٣"
>> x == y
> [1] FALSE
>> x > y
> [1] TRUE
>> x < y
> [1] FALSE
>
>> Sys.getlocale()
> [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
>> Sys.getlocale("LC_COLLATE")
> [1] "en_US.UTF-8"
>
> Notice the differences from en_US.UTF8 (sans hyphen) on your system....
>
> -pd
>
> On 13 Aug 2015, at 16:01 , John McKown <john.archie.mckown at gmail.com> wrote:
>
>> 2015-08-13 8:39 GMT-05:00 Hadley Wickham <h.wickham at gmail.com>:
>>
>>> x <- "\u0663"
>>> y <- 3
>>>
>>> x == y
>>> # FALSE
>>> rank(c(x, y))
>>> # c(1.5, 1.5)
>>>
>>
>> ​also interesting, and confusing to me:
>>
>>> x == y
>> [1] FALSE
>>> x > y
>> [1] FALSE
>>> x < y
>> [1] FALSE
>>>
>>
>> With some slight changes:
>>
>>> x <- "\u0663"
>>> y <- "3"
>>> xy <- c(x,y)
>>> rank(xy);
>> [1] 1.5 1.5
>>> Sys.getlocale();
>> [1]
>> "LC_CTYPE=en_US.UTF8;LC_NUMERIC=C;LC_TIME=en_US.UTF8;LC_COLLATE=en_US.UTF8;LC_MONETARY=en_US.UTF8;LC_MESSAGES=en_US.UTF8;LC_PAPER=en_US.UTF8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF8;LC_IDENTIFICATION=C"
>>> Sys.setlocale(category="LC_COLLATE", locale="C");
>> [1] "C"
>>> rank(xy);
>> [1] 2 1
>>>
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK



More information about the R-devel mailing list