[R] difference in sort order linux/Windows (R.2.11.0)
(Ted Harding)
Ted.Harding at manchester.ac.uk
Fri May 28 12:05:46 CEST 2010
In my response cited below:
On 28-May-10 09:55:36, Ted Harding wrote:
> I suspect the result (in Linux, I can't test this on Windows)
> may be related to the following phenomenon:
>
> sort(c("AB CD","ABCD"))
> # [1] "ABCD" "AB CD"
> sort(c("AB CD","ABCD "))
> # [1] "AB CD" "ABCD "
>
> I.e. "ABCD" precedes "AB CD" apparently because it is shorter,
> despite the fact that it would come later in an alphabetical sort.
> If I use the Linux 'sort' command (on the same machine) I get:
>
> sort << EOT
> "AB CD"
> "ABCD"
> EOT
> "AB CD"
> "ABCD"
>
> sort << EOT
> "AB CD"
> "ABCD "
> EOT
> "AB CD"
> "ABCD "
>
> I.e. the same result for either case. In my view the R result is
> anomalous! In ?Comparison it is stated that characters are translated
> to UTF8 before conparison is done; so a possible explanation could
> be that the UTF8 encoding for SPACE (for all I know) may be greater
> than that for the letters of the alphabet (as opposed to ASCII, where
> -- I do know -- it is less). And, if that is the case, why doesn't it
> apply also in Windows? This strikes me as a nasty little trap!
>
> Ted.
Please ignore the stuff about UTF8 -- the reasoning is false!
(since then "ABCD" and "ABCD " would always precede "AB CD").
I.e. read it as:
I.e. the same result for either case. In my view the R result is
anomalous! This strikes me as a nasty little trap!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10 Time: 11:05:44
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list