[R] How do I use "tapply" in this case ?

Fri Feb 5 14:34:15 CET 2010

On Feb 5, 2010, at 1:50 AM, Bert Gunter wrote:

> Folks:
>
> You can make use of matrix subscripting and avoid R level loops and  
> applys
> altogether. This will end up being many times faster.
>
> Here's your original code:
>
> Z=matrix(rnorm(20), nrow=4)
> index=replicate(4, sample(1:5, 3))
> P=4
> tmpr=list()
> for (i in 1:P)
> {
>  tmp = Z[i,index[,i]]
>  tmpr[[i]]=tmp
> }
>
> for clarity, here's the index matrix I got:
>> index
>     [,1] [,2] [,3] [,4]
> [1,]    5    1    2    3
> [2,]    2    2    4    4
> [3,]    1    5    5    5
>
> Here's what I got for tmpr when I used your code:
>
>> tmpr
> [[1]]
> [1] -0.6246316 -0.8695538 -0.4136176
>
> [[2]]
> [1]  0.02885345 -1.89837071  0.43195955
>
> [[3]]
> [1]  0.2453368 -0.1788287 -0.6620405
>
> [[4]]
> [1] -0.87077697 -1.62554371  0.04464793
>
> So the ith component of tmpr is is just what the indices in the ith  
> column
> of index pick out of the ith row of Z. That is, the first component  
> of tmpr
> are the (1,5), (1,2), and (1,1) elements of Z. Matrix (in general,
> array)indexing -- read the man page for "[" carefully: it's  
> documented in
> the "Matrices and Arrays" section -- allow you to "stack" these  
> pairs (for
> n-dim arrays,n-tuples) row-wise into a matrix and use this matrix as  
> an
> index:
>
>> Z[cbind(c(1,1,1),index[,1])]
> [1] -0.6246316 -0.8695538 -0.4136176
>
> So you can do everything at once by (making use of R's columnwise  
> storage of
> arrays) as:
>
> result <- Z[cbind(as.vector(col(index)), as.vector(index))]
>
> which gives:
>
> [1] -0.62463163 -0.86955383 -0.41361765  0.02885345 -1.89837071   
> 0.43195955
> 0.24533679
> [8] -0.17882867 -0.66204048 -0.87077697 -1.62554371  0.04464793
>
> Note that this vector is the same as: unlist(tmpr). So you can turn  
> it into
> a matrix e.g. where column i is the ith component of tmpr by:
>
> dim(result) <- dim(index)
>
> As I said, for large problems, this should be wayyyyy faster than  
> explicit
> loops or the hidden (and optimized, but still) loops of apply  
> functions.

Well, twice as fast as the explicit anyway:

 > system.time( replicate(10000, {result <-  
Z[cbind(as.vector(col(index)), as.vector(index))]; dim(result) <-  
dim(index)} )
+ )
    user  system elapsed
   0.164   0.001   0.171

 > system.time( replicate(10000, for (i in 1:P) { tmpr[[i]]  <-  
Z[i,index[,i]] } )  )
   user  system elapsed
  0.267   0.049   0.330

Which was in turn twice as fast as the lapply approach:

 > system.time( replicate(10000, tmpr[1:4]<-lapply(1:4, function(i, x,  
y) {x[i,y[,1]]}, Z, index )  ) )
   user  system elapsed
  0.628   0.015   0.646

-- 
David.

>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
> ] On
> Behalf Of RICHARD M. HEIBERGER
> Sent: Thursday, February 04, 2010 9:10 PM
> To: Carrie Li
> Cc: r-help
> Subject: Re: [R] How do I use "tapply" in this case ?
>
> lapply(1:4, function(i, x, y) {x[i,y[,1]]}, Z, index ) ## reproduces
> your results
>
> sapply(1:4, function(i, x, y) {x[i,y[,1]]}, Z, index ) ## collapses
> your list into a set of columns
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT