[R] How do I use "tapply" in this case ?
David Winsemius
dwinsemius at comcast.net
Fri Feb 5 14:34:15 CET 2010
On Feb 5, 2010, at 1:50 AM, Bert Gunter wrote:
> Folks:
>
> You can make use of matrix subscripting and avoid R level loops and
> applys
> altogether. This will end up being many times faster.
>
> Here's your original code:
>
> Z=matrix(rnorm(20), nrow=4)
> index=replicate(4, sample(1:5, 3))
> P=4
> tmpr=list()
> for (i in 1:P)
> {
> tmp = Z[i,index[,i]]
> tmpr[[i]]=tmp
> }
>
> for clarity, here's the index matrix I got:
>> index
> [,1] [,2] [,3] [,4]
> [1,] 5 1 2 3
> [2,] 2 2 4 4
> [3,] 1 5 5 5
>
> Here's what I got for tmpr when I used your code:
>
>> tmpr
> [[1]]
> [1] -0.6246316 -0.8695538 -0.4136176
>
> [[2]]
> [1] 0.02885345 -1.89837071 0.43195955
>
> [[3]]
> [1] 0.2453368 -0.1788287 -0.6620405
>
> [[4]]
> [1] -0.87077697 -1.62554371 0.04464793
>
> So the ith component of tmpr is is just what the indices in the ith
> column
> of index pick out of the ith row of Z. That is, the first component
> of tmpr
> are the (1,5), (1,2), and (1,1) elements of Z. Matrix (in general,
> array)indexing -- read the man page for "[" carefully: it's
> documented in
> the "Matrices and Arrays" section -- allow you to "stack" these
> pairs (for
> n-dim arrays,n-tuples) row-wise into a matrix and use this matrix as
> an
> index:
>
>> Z[cbind(c(1,1,1),index[,1])]
> [1] -0.6246316 -0.8695538 -0.4136176
>
> So you can do everything at once by (making use of R's columnwise
> storage of
> arrays) as:
>
> result <- Z[cbind(as.vector(col(index)), as.vector(index))]
>
> which gives:
>
> [1] -0.62463163 -0.86955383 -0.41361765 0.02885345 -1.89837071
> 0.43195955
> 0.24533679
> [8] -0.17882867 -0.66204048 -0.87077697 -1.62554371 0.04464793
>
> Note that this vector is the same as: unlist(tmpr). So you can turn
> it into
> a matrix e.g. where column i is the ith component of tmpr by:
>
> dim(result) <- dim(index)
>
> As I said, for large problems, this should be wayyyyy faster than
> explicit
> loops or the hidden (and optimized, but still) loops of apply
> functions.
Well, twice as fast as the explicit anyway:
> system.time( replicate(10000, {result <-
Z[cbind(as.vector(col(index)), as.vector(index))]; dim(result) <-
dim(index)} )
+ )
user system elapsed
0.164 0.001 0.171
> system.time( replicate(10000, for (i in 1:P) { tmpr[[i]] <-
Z[i,index[,i]] } ) )
user system elapsed
0.267 0.049 0.330
Which was in turn twice as fast as the lapply approach:
> system.time( replicate(10000, tmpr[1:4]<-lapply(1:4, function(i, x,
y) {x[i,y[,1]]}, Z, index ) ) )
user system elapsed
0.628 0.015 0.646
--
David.
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org
> ] On
> Behalf Of RICHARD M. HEIBERGER
> Sent: Thursday, February 04, 2010 9:10 PM
> To: Carrie Li
> Cc: r-help
> Subject: Re: [R] How do I use "tapply" in this case ?
>
> lapply(1:4, function(i, x, y) {x[i,y[,1]]}, Z, index ) ## reproduces
> your results
>
> sapply(1:4, function(i, x, y) {x[i,y[,1]]}, Z, index ) ## collapses
> your list into a set of columns
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list