[Rd] Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?

Sat May 25 21:48:33 CEST 2013

Hi,

in my packages/functions/code I tend to remove large temporary
variables as soon as possible, e.g. large intermediate vectors used in
iterations.  I sometimes also have the habit of doing this to make it
explicit in the source code when a temporary object is no longer
needed.  However, I did notice that this can add a noticeable overhead
when the rest of the iteration step does not take that much time.

Trying to speed this up, I first noticed that rm(list="a") is much
faster than rm(a).  While at it, I realized that for the purpose of
keeping the memory footprint small, I can equally well reassign the
variable the value of a small object (e.g. a <- NULL), which is
significantly faster than using rm().

SOME BENCHMARKS:
A toy example imitating an iterative algorithm with "large" temporary objects.

x <- matrix(rnorm(100e6), ncol=10e3)

t1 <- system.time(for (k in 1:ncol(x)) {
  a <- x[,k]
  colSum <- sum(a)
  rm(a) # Not needed anymore
  b <- x[k,]
  rowSum <- sum(b)
  rm(b) # Not needed anymore
})

t2 <- system.time(for (k in 1:ncol(x)) {
  a <- x[,k]
  colSum <- sum(a)
  rm(list="a") # Not needed anymore
  b <- x[k,]
  rowSum <- sum(b)
  rm(list="b") # Not needed anymore
})

t3 <- system.time(for (k in 1:ncol(x)) {
  a <- x[,k]
  colSum <- sum(a)
  a <- NULL # Not needed anymore
  b <- x[k,]
  rowSum <- sum(b)
  b <- NULL # Not needed anymore
})

> t1
   user  system elapsed
   8.03    0.00    8.08
> t1/t2
    user   system  elapsed
1.322900 0.000000 1.320261
> t1/t3
    user   system  elapsed
1.715812 0.000000 1.662551

Is there a reason why I shouldn't assign NULL instead of using rm()?
As far as I understand it, the garbage collector will be equally
efficient cleaning out the previous object when using rm(a) or a <-
NULL.  Is there anything else I'm overlooking?  Am I adding overhead
somewhere else?

/Henrik

PS. With the above toy example one can obviously be a bit smarter by using:

t4 <- system.time({for (k in 1:ncol(x)) {
  a <- x[,k]
  colSum <- sum(a)
  a <- x[k,]
  rowSum <- sum(a)
}
rm(list="a")
})

but that's not my point.