[Rd] Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?
Henrik Bengtsson
hb at biostat.ucsf.edu
Sat May 25 21:48:33 CEST 2013
Hi,
in my packages/functions/code I tend to remove large temporary
variables as soon as possible, e.g. large intermediate vectors used in
iterations. I sometimes also have the habit of doing this to make it
explicit in the source code when a temporary object is no longer
needed. However, I did notice that this can add a noticeable overhead
when the rest of the iteration step does not take that much time.
Trying to speed this up, I first noticed that rm(list="a") is much
faster than rm(a). While at it, I realized that for the purpose of
keeping the memory footprint small, I can equally well reassign the
variable the value of a small object (e.g. a <- NULL), which is
significantly faster than using rm().
SOME BENCHMARKS:
A toy example imitating an iterative algorithm with "large" temporary objects.
x <- matrix(rnorm(100e6), ncol=10e3)
t1 <- system.time(for (k in 1:ncol(x)) {
a <- x[,k]
colSum <- sum(a)
rm(a) # Not needed anymore
b <- x[k,]
rowSum <- sum(b)
rm(b) # Not needed anymore
})
t2 <- system.time(for (k in 1:ncol(x)) {
a <- x[,k]
colSum <- sum(a)
rm(list="a") # Not needed anymore
b <- x[k,]
rowSum <- sum(b)
rm(list="b") # Not needed anymore
})
t3 <- system.time(for (k in 1:ncol(x)) {
a <- x[,k]
colSum <- sum(a)
a <- NULL # Not needed anymore
b <- x[k,]
rowSum <- sum(b)
b <- NULL # Not needed anymore
})
> t1
user system elapsed
8.03 0.00 8.08
> t1/t2
user system elapsed
1.322900 0.000000 1.320261
> t1/t3
user system elapsed
1.715812 0.000000 1.662551
Is there a reason why I shouldn't assign NULL instead of using rm()?
As far as I understand it, the garbage collector will be equally
efficient cleaning out the previous object when using rm(a) or a <-
NULL. Is there anything else I'm overlooking? Am I adding overhead
somewhere else?
/Henrik
PS. With the above toy example one can obviously be a bit smarter by using:
t4 <- system.time({for (k in 1:ncol(x)) {
a <- x[,k]
colSum <- sum(a)
a <- x[k,]
rowSum <- sum(a)
}
rm(list="a")
})
but that's not my point.
More information about the R-devel
mailing list