[Rd] Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?

William Dunlap wdunlap at tibco.com
Sat May 25 23:00:16 CEST 2013


Another way to avoid using rm() in loops is to use throw-away
functions.  E.g., 
> t3 <- system.time(for (k in 1:ncol(x)) { # your last, fastest, example
+   a <- x[,k]
+   colSum <- sum(a)
+   a <- NULL # Not needed anymore
+   b <- x[k,]
+   rowSum <- sum(b)
+   b <- NULL # Not needed anymore
+ })
> t4 <- system.time({ # use some throw-away functions
+     colKSum <- function(k) { a <- x[,k] ; sum(a) }
+     rowKSum <- function(k) { b <- x[k,] ; sum(b) }
+     for(k in 1:ncol(x)) {
+         colSum <- colKSum(k)
+         rowSum <- rowKSum(k)
+ }})
> t3
   user  system elapsed 
   7.89    0.02    7.93 
> t4
   user  system elapsed 
   7.88    0.02    7.93
I think the code is clearer.  It might make the compiler's job easier.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
> Of Henrik Bengtsson
> Sent: Saturday, May 25, 2013 12:49 PM
> To: R-devel
> Subject: [Rd] Assigning NULL to large variables is much faster than rm() - any reason why
> I should still use rm()?
> 
> Hi,
> 
> in my packages/functions/code I tend to remove large temporary
> variables as soon as possible, e.g. large intermediate vectors used in
> iterations.  I sometimes also have the habit of doing this to make it
> explicit in the source code when a temporary object is no longer
> needed.  However, I did notice that this can add a noticeable overhead
> when the rest of the iteration step does not take that much time.
> 
> Trying to speed this up, I first noticed that rm(list="a") is much
> faster than rm(a).  While at it, I realized that for the purpose of
> keeping the memory footprint small, I can equally well reassign the
> variable the value of a small object (e.g. a <- NULL), which is
> significantly faster than using rm().
> 
> SOME BENCHMARKS:
> A toy example imitating an iterative algorithm with "large" temporary objects.
> 
> x <- matrix(rnorm(100e6), ncol=10e3)
> 
> t1 <- system.time(for (k in 1:ncol(x)) {
>   a <- x[,k]
>   colSum <- sum(a)
>   rm(a) # Not needed anymore
>   b <- x[k,]
>   rowSum <- sum(b)
>   rm(b) # Not needed anymore
> })
> 
> t2 <- system.time(for (k in 1:ncol(x)) {
>   a <- x[,k]
>   colSum <- sum(a)
>   rm(list="a") # Not needed anymore
>   b <- x[k,]
>   rowSum <- sum(b)
>   rm(list="b") # Not needed anymore
> })
> 
> t3 <- system.time(for (k in 1:ncol(x)) {
>   a <- x[,k]
>   colSum <- sum(a)
>   a <- NULL # Not needed anymore
>   b <- x[k,]
>   rowSum <- sum(b)
>   b <- NULL # Not needed anymore
> })
> 
> > t1
>    user  system elapsed
>    8.03    0.00    8.08
> > t1/t2
>     user   system  elapsed
> 1.322900 0.000000 1.320261
> > t1/t3
>     user   system  elapsed
> 1.715812 0.000000 1.662551
> 
> 
> Is there a reason why I shouldn't assign NULL instead of using rm()?
> As far as I understand it, the garbage collector will be equally
> efficient cleaning out the previous object when using rm(a) or a <-
> NULL.  Is there anything else I'm overlooking?  Am I adding overhead
> somewhere else?
> 
> /Henrik
> 
> 
> PS. With the above toy example one can obviously be a bit smarter by using:
> 
> t4 <- system.time({for (k in 1:ncol(x)) {
>   a <- x[,k]
>   colSum <- sum(a)
>   a <- x[k,]
>   rowSum <- sum(a)
> }
> rm(list="a")
> })
> 
> but that's not my point.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list