[R] loop is going to take 26 hours - needs to be quicker!

Marc Schwartz marc_schwartz at comcast.net
Thu Dec 14 14:33:41 CET 2006


On Thu, 2006-12-14 at 12:56 +0000, Jenny Barnes wrote:
> Dear R-help,
> 
> I have a loop, which is set to take about 26 hours to run at the rate it's going 
> - this is ridiculous and I really need your help to find a more efficient way of 
> loading up my array gpcc.array:
> 
> #My data is stored in a table format with all the data in one long column 
> #running though every longitute, for every latitude, for every year. The 
> #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where 
> #the 5th column is the data:
> 
> #make the array in the format I need [longitude,latitude,years]
> 
> gpcc.array <- array(NA, c(144,72,46)) 
> 
> n=0
> for(k in 1:46){
> for(j in 1:72){
> for(i in 1:144){
> n <- n+1
> gpcc.array[i,j,k] <- gpcc.data2[n,5]
> print(j)
> }
> }
> }
> 
> So it runs through all the longs for every lat for every year - which is the 
> order the data is running down the column in gpcc.data2 so n increses by 1 each 
> time and each data point is pulled off....
> 
> It needs to be a lot quicker, I'd appreciate any ideas!
> 
> Many thanks for taking time to read this,
> 
> Jenny Barnes

Take a "whole object" approach to this problem. You are also wasting a
lot of time by printing the values of 'j' in the loop.


> gpcc.data2 <- matrix(rnorm(476928 * 5), ncol = 5)

> dim(gpcc.data2)
[1] 476928      5
> str(gpcc.data2)
 num [1:476928, 1:5]  2.7385 -0.0438 -0.1084  0.8768 -1.0024 ...


> system.time(gpcc.array <- array(gpcc.data2[, 5], 
                                  dim = c(144, 72, 46)))
[1] 0.024 0.026 0.078 0.000 0.000

You should verify the order of the values and adjust the indices
accordingly, if the above results in an out of order array.

HTH,

Marc Schwartz



More information about the R-help mailing list