[R] sort() depends on locale (and platform and build)
Marius Hofert
marius.hofert at math.ethz.ch
Sun Jun 15 18:34:28 CEST 2014
Hi,
Thanks for you help. I use R-devel under Ubuntu 14.04, here is the output of
sessionInfo():
> sessionInfo()
R Under development (unstable) (2014-06-02 r65832)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.2.0 tools_3.2.0
I assume ICU was not found/installed when R was installed as executing the first
couple of lines of the examples section of ?icuSetCollate leads to:
Warning message:
In icuSetCollate(case_first = "upper") : ICU is not supported on this build
[1] "aarhus" "Aarhus" "safe" "test" "Zoo"
Since only the (default) locale "C" gives the order I expected, I consider
changing my ~/.Rprofile. But it certainly had a reason why I changed it to
"en_US.UTF-8" at some point... hope that does not break anything else. Is there
any "recommendation" what to use in ~/.Rprofile (the default?)? And is the
'recommended approach' to have ICU installed and change the sorting order via
icuSetCollate if necessary?
I would have not expected any influence of the locale on the sorting order,
that's quite good to know. In fact, the example came up after I tried to sort
students' grades in a class with several students having the same last name
(which I made unique by adding the first names with a '.' separator)... quite a
'delicate' issue...
Cheers,
Marius
More information about the R-help
mailing list