[R] Data Frame Indexing
    Jesse Brown 
    jesse.r.brown at lmco.com
       
    Mon Aug 22 14:13:32 CEST 2011
    
    
  
Hello,
I've been dealing with a set of values that contain time stamps and part 
of my summary needs to look at just weekend data. In trying to limit the 
data I've found a large difference in performance in the way I index a 
data frame. I've constructed a minimal example here to try to explain my 
observation.
    is.weekend <- function(x) {
        tm <- as.POSIXlt(x,origin="1970/01/01")
        format(tm,"%a") %in% c("Sat","Sun")
    }
    use.lapply <- function(data) {
        data[do.call(rbind,lapply(data$TIME,FUN=is.weekend)),]
    }
    use.sapply <- function(data) {
        data[sapply(data$TIME,FUN=is.weekend),]
    }
    use.vapply <- function(data) {
        data[vapply(data$TIME,FUN=is.weekend,FALSE),]
    }
    use.indexing <- function(data) {
        data[is.weekend(data$TIME),]
    }
And the results of these methods:
     > names(csv.data)
    [1] "TIME"     "FILE"     "RADIAN"   "BITS"     "DURATION"
     > length(csv.data$TIME)
    [1] 21471
     > system.time(v1 <- use.lapply(csv.data))
       user  system elapsed
     19.562   6.402  25.967
     > system.time(v2 <- use.sapply(csv.data))
       user  system elapsed
     19.456   6.492  25.951
     > system.time(v3 <- use.vapply(csv.data))
       user  system elapsed
     19.334   6.468  25.808
     > system.time(v4 <- use.indexing(csv.data))
       user  system elapsed
      0.032   0.020   0.052
     > all(identical(v1,v2),identical(v2,v3),identical(v3,v4))
    [1] TRUE
Forgive what is probably a trivial question, but why is there such a 
large difference in the *apply functions as opposed to the direct 
indexing method? On the surface it seems as though the use.indexing 
method uses the entire vector as an argument to the function while the 
others /might/ iterate over the values using one at a time as an 
argument to the function. In either case all elements must be part of 
the calculation...
Thanks for any insight.
Jesse
    
    
More information about the R-help
mailing list