[R] difference in sort order linux/Windows (R.2.11.0)
Duncan Murdoch
murdoch.duncan at gmail.com
Fri May 28 12:47:44 CEST 2010
carslaw wrote:
> Dear R users,
>
> I'm a bit perplexed with the effect sort has here, as it is different on
> Windows vs. linux.
> It makes my factor levels and subsequent plots different on the two systems.
>
You are using different collation orders. On Linux, your sessionInfo shows
en_GB.utf8
while Windows shows
English_United Kingdom.1252
so you should be prepared for differences. That said, it certainly
looks as though the string comparison is wrong on Linux. Using Ted
Harding's examples, I get these results:
> "AB CD" > "ABCD"
[1] FALSE
> "AB CD" > "ABCD "
[1] FALSE
on Windows in the English_Canada.1252 locale and on Linux in the C
locale. However, when I use the locale that's default on our system,
en_US.UTF-8, I get
> "AB CD" > "ABCD"
[1] TRUE
> "AB CD" > "ABCD "
[1] FALSE
as Ted did, and that certainly looks wrong.
Duncan Murdoch
> Given:
>
> types <- c("PC-D-Euro-0", "PC-D-Euro-1", "PC-D-Euro-2", "PC-D-Euro-3",
> "PC-D-Euro-4", "PC-D-Euro-5", "PC-D-Euro-6", "LCV-D-Euro-0",
> "LCV-D-Euro-1", "LCV-D-Euro-2", "LCV-D-Euro-3", "LCV-D-Euro-4",
> "LCV-D-Euro-5", "LCV-D-Euro-6", "HGV-D-Euro-0", "HGV-D-Euro-I",
> "HGV-D-Euro-II", "HGV-D-Euro-III", "HGV-D-Euro-IV EGR", "HGV-D-Euro-IV SCR",
> "HGV-D-Euro-IV SCRb", "HGV-D-Euro-V EGR", "HGV-D-Euro-V SCR",
> "HGV-D-Euro-V SCRb", "HGV-D-Euro-VI", "HGV-D-Euro-VIb")
>
> On linux, sort does:
>
> sort(types)
> [1] "HGV-D-Euro-0" "HGV-D-Euro-I" "HGV-D-Euro-II"
> [4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR"
> [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-VI"
> [10] "HGV-D-Euro-VIb" "HGV-D-Euro-V SCR" "HGV-D-Euro-V SCRb"
> [13] "LCV-D-Euro-0" "LCV-D-Euro-1" "LCV-D-Euro-2"
> [16] "LCV-D-Euro-3" "LCV-D-Euro-4" "LCV-D-Euro-5"
> [19] "LCV-D-Euro-6" "PC-D-Euro-0" "PC-D-Euro-1"
> [22] "PC-D-Euro-2" "PC-D-Euro-3" "PC-D-Euro-4"
> [25] "PC-D-Euro-5" "PC-D-Euro-6"
>
>
> And on Windows:
>
> sort(types)
>
> [1] "HGV-D-Euro-0" "HGV-D-Euro-I" "HGV-D-Euro-II"
> [4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR"
> [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-V SCR"
> [10] "HGV-D-Euro-V SCRb" "HGV-D-Euro-VI" "HGV-D-Euro-VIb"
> [13] "LCV-D-Euro-0" "LCV-D-Euro-1" "LCV-D-Euro-2"
> [16] "LCV-D-Euro-3" "LCV-D-Euro-4" "LCV-D-Euro-5"
> [19] "LCV-D-Euro-6" "PC-D-Euro-0" "PC-D-Euro-1"
> [22] "PC-D-Euro-2" "PC-D-Euro-3" "PC-D-Euro-4"
> [25] "PC-D-Euro-5" "PC-D-Euro-6"
>
> Session info for both systems is below. The order I actually want is the
> Windows one, but looking at it,
> the linux order is perhaps more intuitive. However, the problem is the
> order is inconsistent between
> the two systems. Any suggestions?
>
> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
> [5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
> [7] LC_PAPER=en_GB.utf8 LC_NAME=en_GB.utf8
> [9] LC_ADDRESS=en_GB.utf8 LC_TELEPHONE=en_GB.utf8
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=en_GB.utf8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rkward_0.5.3
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
>
>> sessionInfo()
>>
> R version 2.11.0 (2010-04-22)
> x86_64-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252
> [2] LC_CTYPE=English_United Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
>
> attached base packages:
>
> [1] stats graphics grDevices utils datasets methods base
>
> Dr David Carslaw
> King's College London
> Environmental Research Group
> Franklin Wilkins Building
> 150 Stamford Street
> London
> SE1 9NH
>
More information about the R-help
mailing list