[R] Subset by Factor by date

T.D.Rudolph prairie.picker at gmail.com
Sat Jun 14 08:59:11 CEST 2008


I can't speak to the intricacies of the formula but when I run the
ByDataFrame() function provided on a subsample of my data (n=50) it returned
only the very first id value in the output; the rest came out as <NA>.... 
This is not to say it has not properly selected the rows with min(x$diff),
but I have no way of verifying without the id membership in the output.


Charilaos Skiadas-3 wrote:
> 
> 
> On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote:
> 
>>
>> aggregate() is indeed a useful function in this case, but it only  
>> returns the
>> columns by which it was grouped.  Is there a way I can use this while
>> simultaneously retaining all the other column values in the dataframe?
>>
>> e.g. add superfluous (yet pertinent for later) column containing any
>> information at all and retain it in the final output
> 
> I had exactly this kind of need many times, and I have finally  
> created a function for it, which I hope to include soon in an  
> upcoming package. Here is a run of it (I added an extra "A" column  
> containing just the numbers 1:8):
> 
>  > DF
>    id      day diff A
> 1  1 01-01-09  0.5 1
> 2  1 01-01-09  0.7 2
> 3  2 01-01-09  0.2 3
> 4  2 01-01-09  0.4 4
> 5  1 01-02-09  0.1 5
> 6  1 01-02-09  0.3 6
> 7  2 01-02-09  0.3 7
> 8  2 01-02-09  0.4 8
>  > byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),])
>    diff A id      day
> 1  0.5 1  1 01-01-09
> 2  0.2 3  2 01-01-09
> 3  0.1 5  1 01-02-09
> 4  0.3 7  2 01-02-09
> 
> Would that do what you want?
> 
> I've appended the function byDataFrame, and its prerequisite, a  
> function parseIndexList. I'm not quite set on the names yet, but  
> anyway. Hope this helps. I haven't really tested it on large sets, it  
> might perform poorly. Any suggestions on speeding the code /  
> corrections are welcome.
> 
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College
> 
> 
> 
> parseIndexList <- function(indexList) {
>    # browser()
>    if (!is.list(indexList))
>      indexList <- as.list(indexList)
>    nI <- length(indexList)
>    namelist <- vector("list", nI)
>    names(namelist) <- names(indexList)
>    extent <- integer(nI)
>    nx <- length(indexList[[1]])
>    one <- as.integer(1)
>    group <- rep.int(one, nx)
>    ngroup <- one
>    for (i in seq.int(indexList)) {
>        index <- as.factor(indexList[[i]])
>        if (length(index) != nx)
>            stop("arguments must have same length")
>        namelist[[i]] <- sort(unique(indexList[[i]]))
>        extent[i] <- length(namelist[[i]])
>        group <- group + ngroup * (as.integer(index) - one)
>        ngroup <- ngroup * nlevels(index)
>    }
>    nms <- do.call(expand.grid, namelist)
>    ind <- unique(sort(group))
>    res <- data.frame(index=ind, nms[ind, , drop=FALSE])
>    return(list(cases=group, groups=res))
> }
> 
> byDataFrame <- function (data, INDEX, FUN, newnames,  
> omit.index.cols=TRUE, ...) {
> # # Part of the code shamelessly stolen from tapply
>    IND <- eval(substitute(INDEX), data)
>    nms <- as.character(as.list(substitute(INDEX)))
>    if (!is.list(IND)) {
>      IND <- list(IND)
>      names(IND) <- nms
>    } else {
>      names(IND) <- nms[-1]
>    }
>    funname <- paste(as.character(substitute(FUN)), collapse=".")
>    indexInfo <- parseIndexList(IND)
>    FUNx <- if (omit.index.cols) {
>      omit.cols <- match(names(indexInfo$groups)[-1], names(data))
>      function(x, ...) FUN(data[x, -omit.cols], ...)
>    } else {
>      function(x, ...) FUN(data[x, ], ...)
>    }
>    ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...)
>    index <- as.numeric(names(ans))
>    if (!is.data.frame(ans[[1]])) {
>      ans <- lapply(ans, function(x) {
>        dframe <- as.data.frame(t(x))
>        if (is.null(names(x)))
>          names(dframe) <- funname
>        dframe
>      })
>    }
>    lengths <- sapply(ans, nrow)
>    ans <- do.call(rbind, ans)
>    if (!missing(newnames))
>      names(ans) <- newnames
>    nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE]
>    res <- cbind(ans, nms)
>    res
> }
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17836560.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list