[R] Sorting and subsetting

David Winsemius dwinsemius at comcast.net
Mon Sep 20 20:15:21 CEST 2010


On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:

>
> On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
>
>> On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
>> <spector at stat.berkeley.edu> wrote:
>>> Harold -
>>>  Two ways that come to mind:
>>>
>>> 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
>>> 2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
>> 3) do.call(rbind, by(tmp, tmp$index, .Primitive("["), 1:5, 1:2))
>
> I found that rather interesting but somewhat puzzling. I generally  
> thought that using "[" should "work" but by() was complaining:
> Error in FUN(X[[1L]], ...) : could not find function "FUN"
>
> So tried using back-quotes and got a sensible result.

The need for back-quoting disappears if we add a match.fun call to  
by.data.frame():

by.data.frame <-
function (data, INDICES, FUN, ..., simplify = TRUE)
{ FUN <- match.fun(FUN)
     if (!is.list(INDICES)) {
         IND <- vector("list", 1L)
         IND[[1L]] <- INDICES
         names(IND) <- deparse(substitute(INDICES))[1L]
     }
     else IND <- INDICES
     FUNx <- function(x) FUN(data[x, , drop = FALSE], ...)
     nd <- nrow(data)
     ans <- eval(substitute(tapply(1L:nd, IND, FUNx, simplify =  
simplify)),
         data)
     attr(ans, "call") <- match.call()
     class(ans) <- "by"
     ans
}

I would have thought such a call would be in the by.data.frame and  
by.default code but they seem to be "missing in action". Would there  
be any downside to modifying those functions in that manner?

-- 
David.


>
> > do.call(rbind, by(tmp, tmp$index, FUN=`[`, 1:5, 1:2))
>     index        foo
> 1.6      1 -3.0267759
> 1.7      1 -1.3725536
> 1.19     1 -1.1476048
> 1.16     1 -1.0963967
> 1.2      1 -1.0684793
> 2.29     2 -1.6601486
> 2.21     2 -1.2633632
> 2.22     2 -0.9875626
> 2.38     2 -0.9515301
> 2.30     2 -0.8638903
>
> Unlike Dalgaard who arrived at a similar result via a different  
> route and called the row names "silly", I thought they were  
> informative. But maybe the sobriquet was directed at his second  
> solution. I couldn't tell.
>
> -- 
> David.
>
>>
>> Josh
>>
>>>
>>>                                       - Phil Spector
>>>                                        Statistical Computing  
>>> Facility
>>>                                        Department of Statistics
>>>                                        UC Berkeley
>>>                                        spector at stat.berkeley.edu
>>>
>>>
>>>
>>> On Mon, 20 Sep 2010, Doran, Harold wrote:
>>>
>>>> Suppose I have a data frame, such as the one below:
>>>>
>>>> tmp <- data.frame(index = gl(2,20), foo = rnorm(40))
>>>>
>>>> And further assume it is sorted by index and then by the variable  
>>>> foo.
>>>>
>>>> tmp <- tmp[order(tmp$index, tmp$foo) , ]
>>>>
>>>> Now, I want to grab the first N rows of tmp for each index. In  
>>>> the end,
>>>> what I want is the data frame 'result'
>>>>
>>>> tmp1 <- subset(tmp, index == 1)
>>>> tmp2 <- subset(tmp, index == 2)
>>>>
>>>> tmp1 <- tmp1[1:5,]
>>>> tmp2 <- tmp2[1:5,]
>>>> result <- rbind(tmp1, tmp2)
>>>>
>>>> Does anyone see a way to subset and subsequently bind without a  
>>>> loop?
>>>>
>>>> Harold
>>>>
>>>>
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> -- 
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://www.joshuawiley.com/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list