[Rd] Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?
William Dunlap
wdunlap at tibco.com
Sat May 25 23:00:16 CEST 2013
Another way to avoid using rm() in loops is to use throw-away
functions. E.g.,
> t3 <- system.time(for (k in 1:ncol(x)) { # your last, fastest, example
+ a <- x[,k]
+ colSum <- sum(a)
+ a <- NULL # Not needed anymore
+ b <- x[k,]
+ rowSum <- sum(b)
+ b <- NULL # Not needed anymore
+ })
> t4 <- system.time({ # use some throw-away functions
+ colKSum <- function(k) { a <- x[,k] ; sum(a) }
+ rowKSum <- function(k) { b <- x[k,] ; sum(b) }
+ for(k in 1:ncol(x)) {
+ colSum <- colKSum(k)
+ rowSum <- rowKSum(k)
+ }})
> t3
user system elapsed
7.89 0.02 7.93
> t4
user system elapsed
7.88 0.02 7.93
I think the code is clearer. It might make the compiler's job easier.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
> Of Henrik Bengtsson
> Sent: Saturday, May 25, 2013 12:49 PM
> To: R-devel
> Subject: [Rd] Assigning NULL to large variables is much faster than rm() - any reason why
> I should still use rm()?
>
> Hi,
>
> in my packages/functions/code I tend to remove large temporary
> variables as soon as possible, e.g. large intermediate vectors used in
> iterations. I sometimes also have the habit of doing this to make it
> explicit in the source code when a temporary object is no longer
> needed. However, I did notice that this can add a noticeable overhead
> when the rest of the iteration step does not take that much time.
>
> Trying to speed this up, I first noticed that rm(list="a") is much
> faster than rm(a). While at it, I realized that for the purpose of
> keeping the memory footprint small, I can equally well reassign the
> variable the value of a small object (e.g. a <- NULL), which is
> significantly faster than using rm().
>
> SOME BENCHMARKS:
> A toy example imitating an iterative algorithm with "large" temporary objects.
>
> x <- matrix(rnorm(100e6), ncol=10e3)
>
> t1 <- system.time(for (k in 1:ncol(x)) {
> a <- x[,k]
> colSum <- sum(a)
> rm(a) # Not needed anymore
> b <- x[k,]
> rowSum <- sum(b)
> rm(b) # Not needed anymore
> })
>
> t2 <- system.time(for (k in 1:ncol(x)) {
> a <- x[,k]
> colSum <- sum(a)
> rm(list="a") # Not needed anymore
> b <- x[k,]
> rowSum <- sum(b)
> rm(list="b") # Not needed anymore
> })
>
> t3 <- system.time(for (k in 1:ncol(x)) {
> a <- x[,k]
> colSum <- sum(a)
> a <- NULL # Not needed anymore
> b <- x[k,]
> rowSum <- sum(b)
> b <- NULL # Not needed anymore
> })
>
> > t1
> user system elapsed
> 8.03 0.00 8.08
> > t1/t2
> user system elapsed
> 1.322900 0.000000 1.320261
> > t1/t3
> user system elapsed
> 1.715812 0.000000 1.662551
>
>
> Is there a reason why I shouldn't assign NULL instead of using rm()?
> As far as I understand it, the garbage collector will be equally
> efficient cleaning out the previous object when using rm(a) or a <-
> NULL. Is there anything else I'm overlooking? Am I adding overhead
> somewhere else?
>
> /Henrik
>
>
> PS. With the above toy example one can obviously be a bit smarter by using:
>
> t4 <- system.time({for (k in 1:ncol(x)) {
> a <- x[,k]
> colSum <- sum(a)
> a <- x[k,]
> rowSum <- sum(a)
> }
> rm(list="a")
> })
>
> but that's not my point.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list