[R] difference in sort order linux/Windows (R.2.11.0)
(Ted Harding)
Ted.Harding at manchester.ac.uk
Fri May 28 13:49:22 CEST 2010
It would seem that there is indeed a locale effect. Revisiting the
examples I used on Linux in a previous post, at which time I was
using the default "LC_COLLATE=en_GB.UTF-8", I changed this to "C".
Both the "C" and the "en_GB.UTF-8" are indicated (the latter copied
from my previous post):
Sys.setlocale("LC_COLLATE", "C")
# [1] "C"
sort(c("AB CD","ABCD"))
# [1] "AB CD" "ABCD" ## (C)
# [1] "ABCD" "AB CD" ## (en_GB.UTF-8)
sort(c("AB CD","ABCD "))
# [1] "AB CD" "ABCD " ## (C)
# [1] "AB CD" "ABCD " ## (en_GB.UTF-8)
So the "C" ordering comes out as one would expect in either case,
while the "en_GB.UTF-8" ordering does not in the first case (where
the two strings are of different lengths).
Is there any way to extract the numerical encoding of a character
string (according to the collating locale encoding) to which the
comparison in the sort() algorithm is applied?
Ted.
On 28-May-10 11:07:57, Joris Meys wrote:
> Pretty obvious: You use different locales (collate). What happens if
> you use
> the same on both machines?
>
> Cheers
> Joris
>
> On Fri, May 28, 2010 at 10:17 AM, carslaw <david.carslaw at kcl.ac.uk>
> wrote:
>> Dear R users,
>>
>> I'm a bit perplexed with the effect sort has here, as it is different
>> on
>> ...
>> the linux order is perhaps more intuitive. However, the problem is
>> the
>> order is inconsistent between
>> the two systems. Any suggestions?
>>
>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-pc-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
>> [5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
>> [7] LC_PAPER=en_GB.utf8 LC_NAME=en_GB.utf8
>> [9] LC_ADDRESS=en_GB.utf8 LC_TELEPHONE=en_GB.utf8
>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=en_GB.utf8
>> ...
>> > sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252
>> [2] LC_CTYPE=English_United Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>> ...
>> Dr David Carslaw
> --
> Joris Meys
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10 Time: 12:49:19
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list