[R] Pre-allocation of matrices is LESS efficient?
Duncan Murdoch
murdoch.duncan at gmail.com
Thu Feb 17 19:33:47 CET 2011
On 17/02/2011 11:02 AM, Alex F. Bokov wrote:
> Motivation: during each iteration, my code needs to collect tabular data (and use it only during that iteration), but the rows of data may vary. I thought I would speed it up by preinitializing the matrix that collects the data with zeros to what I know to be the maximum number of rows. I was surprised by what I found...
>
> # set up (not the puzzling part)
> x<-matrix(runif(20),nrow=4); y<-matrix(0,nrow=12,ncol=5); foo<-c();
>
> # this is what surprises me... what the?
> > system.time(for(i in 1:100000){n<-sample(1:4,1);y[1:n,]<-x[1:n,];});
> user system elapsed
> 1.510 0.000 1.514
> > system.time(for(i in 1:100000){n<-sample(1:4,1);foo<-x[1:n,];});
> user system elapsed
> 1.090 0.000 1.085
>
> These results are very repeatable. So, if I'm interpreting them correctly, dynamically allocating 'foo' each time to whatever the current output size is runs faster than writing to a subset of a preallocated 'y'? How is that possible?
The expression
y[1:n,]<-x[1:n,]
creates a new temporary variable to hold the result of the expression
x[1:n,], then copies the elements of it to y[1:n,].
The expression
foo <- x[1:n,]
creates the same temporary, and then binds foo to it without doing any
copying. Much less work.
> And, more generally, I'm sure other people have encountered this type of situation. Am I reinventing the wheel? Is there a best practice for storing temporary loop-specific data?
Storing the value of an expression in a new variable will always be
faster than copying it into part of an existing variable.
Duncan Murdoch
P.S. You might be aware of this, but there's one other thing that might
be a surprise to you: x[1:1,] will be a vector, while x[1:n,] will be
a matrix for n>1. Use the "drop=FALSE" argument if you always want a
matrix result.
> Thanks.
>
> PS: By the way, though I cannot write to foo[,] because the size is different each time, I tried writing to foo[] and the runtime was worse than either of the above examples.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list