[Rd] [R] custom sort?

Duncan Murdoch murdoch at stats.uwo.ca
Fri May 29 19:02:48 CEST 2009


On 5/29/2009 9:28 AM, Duncan Murdoch wrote:
> I've moved this to R-devel...
> 
> On 5/28/2009 8:17 PM, Stavros Macrakis wrote:
>> I couldn't get your suggested method to work:
>> 
>>   `==.foo` <- function(a,b) unclass(a)==unclass(b)
>>   `>.foo` <- function(a,b) unclass(a) < unclass(b)     # invert comparison
>>   is.na.foo <- function(a)is.na(unclass(a))
>> 
>>   sort(structure(sample(5),class="foo"))  #-> 1:5  -- not reversed
>> 
>> What am I missing?
> 
> There are two problems.  First, I didn't mention that you need a method 
> for indexing as well.  The code needs to evaluate things like x[i] > 
> x[j], and by default x[i] will not be of class "foo", so the custom 
> comparison methods won't be called.
> 
> Second, I think there's a bug in the internal code, specifically in 
> do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
>   do_rank pays attention when breaking ties, so I think this is an 
> oversight.
> 
> So I'd say two things should be done:
> 
>   1.  the bug should be fixed.  Even if this isn't the most obvious 
> approach, it should work.

I've now fixed the bug, and clarified the documentation to say

   The default method will make use of == and > methods
   for the class of x[i] (for integers i), and the
   is.na method for the class of x, but might be rather
   slow when doing so.

You don't actually need a custom indexing method, you just need to be 
aware that it's the class of x[i] that is important for comparisons.

This will make it into R-patched and R-devel.

Duncan Murdoch

> 
>   2.  we should look for ways to make all of this simpler, e.g. allowing 
> a comparison function to be used.
> 
> I'll take on 1, but not 2.  It's hard to work out the right place for 
> the comparison function to appear, and it would require a lot of work to 
> implement, because all of this stuff (sort, rank, order, xtfrm, 
> sort.int, etc.) is closely interrelated, some but not all of the 
> functions are S3 generics, some implemented internally, etc.  In the 
> end, I'd guess the results won't be very satisfactory from a performance 
> point of view:  all those calls out to R to do the comparisons are going 
> to be really slow.
> 
> I think your advice to use order() with multiple keys is likely to be 
> much faster in most instances.  It's just a better approach in R.
> 
> Duncan Murdoch
> 
>> 
>>            -s
>> 
>> On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch <murdoch at stats.uwo.ca>wrote:
>> 
>>> On 28/05/2009 5:34 PM, Steve Jaffe wrote:
>>>
>>>> Sounds simple but haven't been able to find it in docs: is it possible to
>>>> sort a vector using a user-defined comparison function? Seems it must be,
>>>> but "sort" doesn't seem to provide that option, nor does "order" sfaics
>>>>
>>>
>>> You put a class on the vector (e.g. using class(x) <- "myvector"), then
>>> define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison
>>> methods (you'll need ==.myvector, >.myvector, and is.na.myvector).
>>>
>>> Duncan Murdoch
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-devel mailing list