[R] speed up this algorithm (apply-fuction / 4D array)

Wed Oct 5 20:24:03 CEST 2011

I corrected your code a bit and put it into a function, f0, to
make testing easier.  I also made a small dataset to make
testing easier.  Then I made a new function f1 which does
what f0 does in a vectorized manner:

  x <- array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91))
  xsmall <- array(log(seq_len(2 * 2 * 2 * 91)), dim=c(2, 2, 2, 91))

  f0 <- function(x) {
      data_reduced <- array(0, dim=c(dim(x)[1:3], trunc(dim(x)[4]/3)))
      reduce <- seq(1, dim(x)[4]-1, by=3)
      for( i in 1:length(reduce) ) {
          data_reduced[ , , , i] <- apply(x[ , , , reduce[i] : (reduce[i]+2) ], 1:3, mean)
     }
     data_reduced
  }

  f1 <- function(x) {
     reduce <- seq(1, dim(x)[4]-1, by=3)
     data_reduced <- (x[, , , reduce] + x[, , , reduce+1] + x[, , , reduce+2]) / 3
     data_reduced
  }

The results were:

  > system.time(v1 <- f1(x))
     user  system elapsed
    0.280   0.040   0.323
  > system.time(v0 <- f0(x))
     user  system elapsed
   73.760   0.060  73.867
  > all.equal(v0, v1)
  [1] TRUE

>> "I thought apply would already vectorize, rather than loop over every coordinate."
No, you have that backwards.  Use *apply functions when you cannot figure
out how to vectorize.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Batholdy
> Sent: Wednesday, October 05, 2011 10:40 AM
> To: R Help
> Subject: [R] speed up this algorithm (apply-fuction / 4D array)
> 
> Hi,
> 
> 
> I have this sample-code (see above) and I was wondering wether it is possible to speed things up.
> 
> 
> 
> What this code does is the following:
> 
> x is 4D array (you can imagine it as x, y, z-coordinates and a time-coordinate).
> 
> So x contains 50x50x50 data-arrays for 91 time-points.
> 
> Now I want to reduce the 91 time-points.
> I want to merge three consecutive time points to one time-points by calculating the mean of this three
> time-points for every x,y,z coordinate.
> 
> The reduce-sequence defines which time-points should get merged.
> And the apply-function in the for-loop calculates the mean of the three 3D-Arrays and puts them into a
> new 4D array (data_reduced).
> 
> 
> 
> The problem is that even in this example it takes really long.
> I thought apply would already vectorize, rather than loop over every coordinate.
> 
> But for my actual data-set it takes a really long time ... So I would be really grateful for any
> suggestions how to speed this up.
> 
> 
> 
> 
> x <- array(rnorm(50 * 50 * 50 * 90, 0, 2), dim=c(50, 50, 50, 91))
> 
> 
> 
> data_reduced <- array(0, dim=c(50, 50, 50, 90/3))
> 
> reduce <- seq(1,90, 3)
> 
> 
> 
> for( i in 1:length(reduce) ) {
> 
> 	data_reduced[ , , , i]    <-    apply(x[ , , , reduce[i] : (reduce[i]+3) ], 1:3, mean)
> }
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.