[R] Strange result from sort: sort(c("aa", "ff")) gives "ff" "aa" with R.2.12.1 on windows 7
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jan 24 23:18:20 CET 2011
On Mon, 24 Jan 2011, Søren Højsgaard wrote:
> Dear list,
>
> Please consider the following call of sort
>
>> sort(c("a","f"))
> [1] "a" "f"
>> sort(c("f","a"))
> [1] "a" "f"
>>
>> sort(c("aa","ff"))
> [1] "ff" "aa"
>> sort(c("ff","aa"))
> [1] "ff" "aa"
> The last two results look strange to me. Is that a bug???
It seems that you and your OS disagree about Danish, and I'm in no
position to know which is correct. But this is not an R issue: the
sorting is done by OS services.
> The result seems to come from calls to order:
>
>> order(c("a","f"))
> [1] 1 2
>> order(c("f","a"))
> [1] 2 1
>>
>> order(c("aa","ff"))
> [1] 2 1
>> order(c("ff","aa"))
> [1] 1 2
> I get the same results on R.2.12.1, R.2.11.1 and R.2.13.0 on Windows
> 7. However on Linux, I get the "right answer" (the answer I
> expected). From the help pages I get the impression that there might
> be an issue about locale, but I didn't understand the details.
>
> Can anyone tell me what goes on here, please
I recall that 'aa' used to sort at the end of the alphabet in Danish
telephone books, so it seems the sort used on Windows thinks so too.
See ?Comparison for some further details. What I don't understand is
that someone resident in Denmark finds this strange ....
I get exactly the same in a Danish locale on Mac OS X, for example:
> sort(c("aa","ff"))
[1] "ff" "aa"
and also on my Linux box (Fedora 14 with LC_COLLATE=da_DK.utf8)
> sort(c("aa","ff"))
[1] "ff" "aa"
en_DK is not a Danish locale (in is English in Denmark). If you want
an English sort, try an English locale for LC_COLLATE (there may well
be several, hence 'an').
>
> Regards
> Søren
>
>
>
>
>
>
>> sessionInfo()
> R version 2.12.1 Patched (2010-12-27 r53883)
> Platform: i386-pc-mingw32/i386 (32-bit)
> locale:
> [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252
> [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
> [5] LC_TIME=Danish_Denmark.1252
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] SHDtools_1.0
>
>
>> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: i686-pc-linux-gnu (32-bit)
> locale:
> [1] LC_CTYPE=en_DK.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_DK.utf8 LC_COLLATE=en_DK.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_DK.utf8
> [7] LC_PAPER=en_DK.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_DK.utf8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list