[R] difference in sort order linux/Windows (R.2.11.0)

(Ted Harding) Ted.Harding at manchester.ac.uk
Fri May 28 12:05:46 CEST 2010


In my response cited below:

On 28-May-10 09:55:36, Ted Harding wrote:
> I suspect the result (in Linux, I can't test this on Windows)
> may be related to the following phenomenon:
> 
>   sort(c("AB CD","ABCD"))
>   # [1] "ABCD"  "AB CD"
>   sort(c("AB CD","ABCD "))
>   # [1] "AB CD" "ABCD "
> 
> I.e. "ABCD" precedes "AB CD" apparently because it is shorter,
> despite the fact that it would come later in an alphabetical sort.
> If I use the Linux 'sort' command (on the same machine) I get:
> 
> sort << EOT
> "AB CD"
> "ABCD"
> EOT
> "AB CD"
> "ABCD"
> 
> sort << EOT
> "AB CD"
> "ABCD "
> EOT
> "AB CD"
> "ABCD "
> 
> I.e. the same result for either case. In my view the R result is
> anomalous! In ?Comparison it is stated that characters are translated
> to UTF8 before conparison is done; so a possible explanation could
> be that the UTF8 encoding for SPACE (for all I know) may be greater
> than that for the letters of the alphabet (as opposed to ASCII, where
> -- I do know -- it is less). And, if that is the case, why doesn't it
> apply also in Windows? This strikes me as a nasty little trap!
> 
> Ted.

Please ignore the stuff about UTF8 -- the reasoning is false!
(since then "ABCD" and "ABCD " would always precede "AB CD").
I.e. read it as:

  I.e. the same result for either case. In my view the R result is
  anomalous! This strikes me as a nasty little trap!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10                                       Time: 11:05:44
------------------------------ XFMail ------------------------------



More information about the R-help mailing list