[R] Strange result from sort: sort(c("aa", "ff")) gives "ff" "aa" with R.2.12.1 on windows 7
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Feb 2 13:20:59 CET 2011
'Strange' to have no response on this. Can a knowledgeable Danish
writer please confirm that this is how the OSes are supposed to handle
Danish collation?
On Mon, 24 Jan 2011, Prof Brian Ripley wrote:
> On Mon, 24 Jan 2011, Søren Højsgaard wrote:
>
>> Dear list,
>>
>> Please consider the following call of sort
>>
>>> sort(c("a","f"))
>> [1] "a" "f"
>>> sort(c("f","a"))
>> [1] "a" "f"
>>>
>>> sort(c("aa","ff"))
>> [1] "ff" "aa"
>>> sort(c("ff","aa"))
>> [1] "ff" "aa"
>> The last two results look strange to me. Is that a bug???
>
> It seems that you and your OS disagree about Danish, and I'm in no position
> to know which is correct. But this is not an R issue: the sorting is done by
> OS services.
>
>> The result seems to come from calls to order:
>>
>>> order(c("a","f"))
>> [1] 1 2
>>> order(c("f","a"))
>> [1] 2 1
>>>
>>> order(c("aa","ff"))
>> [1] 2 1
>>> order(c("ff","aa"))
>> [1] 1 2
>
>> I get the same results on R.2.12.1, R.2.11.1 and R.2.13.0 on Windows 7.
>> However on Linux, I get the "right answer" (the answer I expected). From
>> the help pages I get the impression that there might be an issue about
>> locale, but I didn't understand the details.
>>
>> Can anyone tell me what goes on here, please
>
> I recall that 'aa' used to sort at the end of the alphabet in Danish
> telephone books, so it seems the sort used on Windows thinks so too. See
> ?Comparison for some further details. What I don't understand is that
> someone resident in Denmark finds this strange ....
>
> I get exactly the same in a Danish locale on Mac OS X, for example:
>
>> sort(c("aa","ff"))
> [1] "ff" "aa"
>
> and also on my Linux box (Fedora 14 with LC_COLLATE=da_DK.utf8)
>
>> sort(c("aa","ff"))
> [1] "ff" "aa"
>
> en_DK is not a Danish locale (in is English in Denmark). If you want an
> English sort, try an English locale for LC_COLLATE (there may well be
> several, hence 'an').
>
>>
>> Regards
>> Søren
>>
>>
>>
>>
>>
>>
>>> sessionInfo()
>> R version 2.12.1 Patched (2010-12-27 r53883)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>> locale:
>> [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252
>> [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
>> [5] LC_TIME=Danish_Denmark.1252
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] SHDtools_1.0
>>
>>
>>> sessionInfo()
>> R version 2.12.1 (2010-12-16)
>> Platform: i686-pc-linux-gnu (32-bit)
>> locale:
>> [1] LC_CTYPE=en_DK.utf8 LC_NUMERIC=C
>> [3] LC_TIME=en_DK.utf8 LC_COLLATE=en_DK.utf8
>> [5] LC_MONETARY=C LC_MESSAGES=en_DK.utf8
>> [7] LC_PAPER=en_DK.utf8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_DK.utf8 LC_IDENTIFICATION=C
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list