[R] difference in sort order linux/Windows (R.2.11.0)
Steven Lembark
lembark at wrkhors.com
Sun May 30 17:20:26 CEST 2010
On Fri, 28 May 2010 01:17:49 -0700 (PDT)
carslaw <david.carslaw at kcl.ac.uk> wrote:
> [4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR"
> [4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR"
> [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-VI"
> [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-V SCR"
This is a lexical sort. Depending on the locale the
items may not sort in ASCII order. For example, a
European-latin locale may have some letters in
different places than ASCII. You have to check
what is being sorted (e.g., map the stuff to UTF8
binary).
You might also find that input generated on windog
has "smart spaces" in it from the generating program
(e.g., Excell) that are something like \xA0 instead
of \x20 (32d) used in ASCII spaces.
Suggestion: Validate the data with something like
"od -cx" on linux so you know what you are sorting.
Then dump it out as hex in R [sorry, I have no idea
how to do that] and see if what you are sorting
matches. After that validate the LOCALE setting
on both sides. If all of those turn up the same
raw data then you've found a bug in R -- or at least
need to read some fine print in the lexical sort
docs.
--
Steven Lembark 85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
lembark at wrkhors.com +1 888 359 3508
More information about the R-help
mailing list