[Rd] wish list: generalized apply

John P. Nolan jpnolan at american.edu
Fri Dec 9 02:00:22 CET 2016



-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, December 8, 2016 4:59 PM
To: John P. Nolan <jpnolan at american.edu>
Cc: Charles C. Berry <R-devel at r-project.org>
Subject: Re: [Rd] wish list: generalized apply


> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu> wrote:
> 
> Dear All,
> 
> I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working.  A simple example is:
> 
> A <- array( runif(160), dim=c(5,4,8) ) x <- matrix( runif(32), nrow=4, 
> ncol=8 ) b <- runif(8)
> f1 <- function( A, x, b ) { sum( A %*% x ) + b } result <- rep(0.0,8) 
> for (i in 1:8) {  result[i] <- f1( A[,,i], x[,i] , b[i] ) }
> 
> This works, but is slow.  I'd like to be able to do something like:
>    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i].  
> 
> Does such a generalized.apply already exist somewhere?  While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this.  

I would have thought that this would achieve the same result:

result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

Or: 

result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

(I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.)
-- 

David Winsemius
Alameda, CA, USA

====================================================================================

Thanks for the response.  I gave a toy example with 8 iterations to illustrate the point,  so I thought I would bump it up to make my point about speed.  But to my surprise, using a 'for' loop is FASTER than using 'sapply' as David suggest or even 'apply'  on a bit simpler problem.   Here is the example:

n <- 800000; m <- 10; k <- 10
A <- array( 1:(m*n*k), dim=c(m,k,n) )
y <- matrix( 1:(k*n), nrow=k, ncol=n )
b <- 1:n
f1 <- function( A, y, b ) { sum( A %*% y ) + b }

# use a for loop
time1 <- system.time( {
result <- rep(0.0,n)
for (i in 1:n) {
  result[i] <- f1( A[,,i], y[,i] , b[i] )
}
result } )

#  use sapply
time2 <- system.time( result2 <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], y[,i] , b[i] )} ))

# fix y and b, and use standard apply
time3 <- system.time( result3 <- apply( A, MARGIN=3, FUN=f1, y=y[,1], b=b[1] ) ) 

# user times, then ratios of user times
c( time1[1], time2[1],time3[1]); c( time2[1]/time1[1], time3[1]/time1[1] )  
#   4.84      5.22      5.32 
#   1.078512  1.099174

So using a for loop saves 8-10% of the execution time as compared to sapply and apply!?  Years ago I experimented and found out I could speed things up noticeably by replacing loops with apply.  This is no longer the case, at least in this simple experiment.  Is this a result of byte code?  Can someone tell us when a for loop is going to be slower than using apply?  A more complicated loop that computes multiple quantities?  

John



More information about the R-devel mailing list