[R] sort() depends on locale (and platform and build)

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Jun 15 18:47:24 CEST 2014


On 15/06/2014 17:34, Marius Hofert wrote:
> Hi,
>
> Thanks for you help. I use R-devel under Ubuntu 14.04, here is the output of
> sessionInfo():
>
>> sessionInfo()
> R Under development (unstable) (2014-06-02 r65832)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.2.0 tools_3.2.0
>
>
> I assume ICU was not found/installed when R was installed as executing the first
> couple of lines of the examples section of ?icuSetCollate leads to:
>
> Warning message:
> In icuSetCollate(case_first = "upper") : ICU is not supported on this build
> [1] "aarhus" "Aarhus" "safe"   "test"   "Zoo"
>
>
> Since only the (default) locale "C" gives the order I expected, I consider
> changing my ~/.Rprofile. But it certainly had a reason why I changed it to
> "en_US.UTF-8" at some point... hope that does not break anything else. Is there
> any "recommendation" what to use in ~/.Rprofile (the default?)? And is the
> 'recommended approach' to have ICU installed and change the sorting order via
> icuSetCollate if necessary?

Yes.  (You can use the locale category LC_COLLATE or icuSetCollate, but 
the recommended way to do the first is via the environment variables, 
not in .Rprofile.)

>
> I would have not expected any influence of the locale on the sorting order,
> that's quite good to know. In fact, the example came up after I tried to sort
> students' grades in a class with several students having the same last name
> (which I made unique by adding the first names with a '.' separator)... quite a
> 'delicate' issue...
>
> Cheers,
>
> Marius
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list