[R] What "method" does sort() use?

peter dalgaard pdalgd at gmail.com
Fri Mar 18 10:19:40 CET 2016


Ooops, that was answering the question you actually asked. The one you meant to ask is answered by this  part:

The sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison. 

...and collating sequences is a weird and woolly subject, where you cannot even be sure that locales of the same name on two different platforms sort strings in the same order.

-pd



On 18 Mar 2016, at 10:13 , peter dalgaard <pdalgd at gmail.com> wrote:

> 
> On 18 Mar 2016, at 10:02 , Patrick Connolly <p_connolly at slingshot.co.nz> wrote:
> 
>> I don't follow why this happens:
>> 
>>> sort(c(LETTERS[1:5], letters[1:5]))
>> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E"
>> 
>> The help for sort() says:
>> 
>> method: character string specifying the algorithm used.  Not
>>         available for partial sorting.  Can be abbreviated.
>> 
>> But what are the methods available?  The help mentions xtfrm but that
>> doesn't illuminate, I'd have thought that at least by default it would
>> have something to do with ASCII codes.  But that's not the case since
>> all the uppercase ones would be before the lowercase ones.
>> 
>> I know something different is happening but I don't know what it is
>> (do you, Mr Jones?).  Apologies to Bob Dylan.
>> 
> 
> 
> Um, read _all_ of the help file?
> 
> sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE,
>         method = c("shell", "quick"), index.return = FALSE)
> 
> [snip]
> 
> Method "shell" uses Shellsort (an O(n^{4/3}) variant from Sedgewick (1986)). If x has names a stable modification is used, so ties are not reordered. (This only matters if names are present.)
> 
> Method "quick" uses Singleton (1969)'s implementation of Hoare's Quicksort method and is only available when x is numeric (double or integer) and partial is NULL. (For other types of x Shellsort is used, silently.) It is normally somewhat faster than Shellsort (perhaps 50% faster on vectors of length a million and twice as fast at a billion) but has poor performance in the rare worst case. (Peto's modification using a pseudo-random midpoint is used to make the worst case rarer.) This is not a stable sort, and ties may be reordered.
> 
> Factors with less than 100,000 levels are sorted by radix sorting when method is not supplied: see sort.list.
> 
> -pd
> 
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list