[R] sorting character vectors
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Aug 19 14:10:32 CEST 2004
It is documented to depend on your locale. I get
> sort(x)
[1] " A" " B" " C" "A" "B" "C"
in the C locale. The help page does say so:
The sort order for character vectors will depend on the collating
sequence of the locale in use: see 'Comparison'.
The default collation sequences for standard locales in Linux distros are
quite unintuitive (and are not character-by-character either). If you
want ASCII, ask for it by LC_COLLATE=C.
On Thu, 19 Aug 2004 andreas.krause at pharma.novartis.com wrote:
> The following is not what I expected in sorting characters (single letters
> and the same letters with preceding spaces).
> Can someone enlighten me as to why the following might be a correct result
> for sorting?
>
> ; x <- c(LETTERS[1:3], paste(" ", LETTERS[1:3], sep=""))
> ; x
> [1] "A" "B" "C" " A" " B" " C"
> ; sort(x)
> [1] "A" " A" "B" " B" "C" " C"
> ; sort(x, method="shell")
> [1] "A" " A" "B" " B" "C" " C"
> ; sort(x, method="quick")
> [1] "A" " A" "B" " B" "C" " C"
>
> I would expect the result to be " A" " B" " C" "A" "B" "C" instead,
> going by ASCII codes (and a quick check with S-Plus 6.2 shows that this is
> what S-Plus thinks the sorted sequence is).
That explicitly says it uses ASCII. I believe that is a deficiency they
plan to correct.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list