[Rd] wish list: generalized apply
John P. Nolan
jpnolan at american.edu
Fri Dec 9 02:00:22 CET 2016
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Thursday, December 8, 2016 4:59 PM
To: John P. Nolan <jpnolan at american.edu>
Cc: Charles C. Berry <R-devel at r-project.org>
Subject: Re: [Rd] wish list: generalized apply
> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu> wrote:
>
> Dear All,
>
> I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working. A simple example is:
>
> A <- array( runif(160), dim=c(5,4,8) ) x <- matrix( runif(32), nrow=4,
> ncol=8 ) b <- runif(8)
> f1 <- function( A, x, b ) { sum( A %*% x ) + b } result <- rep(0.0,8)
> for (i in 1:8) { result[i] <- f1( A[,,i], x[,i] , b[i] ) }
>
> This works, but is slow. I'd like to be able to do something like:
> generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i].
>
> Does such a generalized.apply already exist somewhere? While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this.
I would have thought that this would achieve the same result:
result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )
Or:
result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} )
(I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.)
--
David Winsemius
Alameda, CA, USA
====================================================================================
Thanks for the response. I gave a toy example with 8 iterations to illustrate the point, so I thought I would bump it up to make my point about speed. But to my surprise, using a 'for' loop is FASTER than using 'sapply' as David suggest or even 'apply' on a bit simpler problem. Here is the example:
n <- 800000; m <- 10; k <- 10
A <- array( 1:(m*n*k), dim=c(m,k,n) )
y <- matrix( 1:(k*n), nrow=k, ncol=n )
b <- 1:n
f1 <- function( A, y, b ) { sum( A %*% y ) + b }
# use a for loop
time1 <- system.time( {
result <- rep(0.0,n)
for (i in 1:n) {
result[i] <- f1( A[,,i], y[,i] , b[i] )
}
result } )
# use sapply
time2 <- system.time( result2 <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], y[,i] , b[i] )} ))
# fix y and b, and use standard apply
time3 <- system.time( result3 <- apply( A, MARGIN=3, FUN=f1, y=y[,1], b=b[1] ) )
# user times, then ratios of user times
c( time1[1], time2[1],time3[1]); c( time2[1]/time1[1], time3[1]/time1[1] )
# 4.84 5.22 5.32
# 1.078512 1.099174
So using a for loop saves 8-10% of the execution time as compared to sapply and apply!? Years ago I experimented and found out I could speed things up noticeably by replacing loops with apply. This is no longer the case, at least in this simple experiment. Is this a result of byte code? Can someone tell us when a for loop is going to be slower than using apply? A more complicated loop that computes multiple quantities?
John
More information about the R-devel
mailing list