[Rd] sort() generic? [Re: Suspicious behaviour of sort on POSIXct ..]

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Jun 7 18:17:31 CEST 2006


On Wed, 7 Jun 2006, Martin Maechler wrote:

> {Diverted to R-devel}
>
>>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>>>     on Tue, 6 Jun 2006 21:14:01 +0100 (BST) writes:
>
>    BDR> On Tue, 6 Jun 2006, patrick.guevel at uk.bnpparibas.com
>    BDR> wrote:
>    >> Hi ,
>    >>
>    >> When I sort a vector of POSIXct values in R-2.3.0 and
>    >> R-2.3.1, I get a vector of numeric values and this gets
>    >> some of my code to crash (class object creation). Is that
>    >> a R bug?
>
>    BDR> No, it is as documented: see ?sort
>
>    BDR>       As from R 2.3.0, all attributes are removed from
>    BDR> the return value except names, which are sorted.  (If
>    BDR> 'partial' is specified even the names are removed.)
>
>    BDR> Note, the class is an attribute.  For many classes
>    BDR> sorting destroys the appropriateness of the class.

(An example BTW is a time series: the times associated with values would 
be shuffled on sorting.)

> Indeed, and I agree this a good change.
> However, the above also suggests that ideally,  sort() would be
> a generic function.
>
> One good reason for  sort()  not being generic now is the fact
> that method dispatch costs a bit, *and* that we like sort() to
> be really fast.
> One way to achieve a generic sort() and keep the possibility of
> of very fast sort() --- similarly to  rep() and rep.int()
> would be to rename the current sort into something like
> sortNum() {"Num" for numeric}, make sort() into a generic,
> and replace sort() by sortNum() in those code parts which need
> to remain optimally fast.

The current sort handles much more than numeric vectors.  I agree with 
reasoning, but it seems wrong to ask all those using sort() in packages
to rename their usages (conditional on R >= 2.4.0) not to be penalized.

sort() is an interface to three C functions.  It ought to be possible to 
use internal dispatch for internal sort and (not for psort or qsort), for 
which I believe the overhead would be acceptable.  I think that is 
acceptable: people who set partial or method must have read the help page.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list