[Rd] Best practices for writing R functions (really copying)

Mon Jul 25 20:11:59 CEST 2011

Use tracemem() instead, i.e.

> A <- matrix(c(1.0,1.1), nrow=5, ncol=10);
> tracemem(A);
[1] "<0x00000000047ab170"
> A[1,1] <- 7;
> B <- sqrt(A);
tracemem[0x00000000047ab170 -> 0x000000000552f338]:
> A[1,1] <- 7;
> B <- t(A);
> A[1,1] <- 7;
tracemem[0x00000000047ab170 -> 0x00000000057ba588]:
> A[1,1] <- 7;
> A[1,1] <- 7;

It looks like sqrt() creates the copy internally, which explains the difference.

However, it is true that even if a new copy is not needed/created
inside a function call, a function "touching" the object would trigger
downstream copies, e.g.

# Not touching the object:
> foo <- function(X) { 0 }
> B <- foo(A);
> A[1,1] <- 7;
> A[1,1] <- 7;

# Touching the object:
> bar <- function(X) { Y <- X; 0 }
> B <- bar(A);
> A[1,1] <- 7;
tracemem[0x00000000039b5538 -> 0x000000000402c448]:
> A[1,1] <- 7;

However however, try doing the same with a vector instead of matrix,
e.g. A <- 1:10, and/or assignment with A[1] <- 7 and you get a
different behavior.  The source code should explain why.  I leave it
at this.

My $.02

/Henrik

On Mon, Jul 25, 2011 at 8:53 AM, Radford Neal <radford at cs.toronto.edu> wrote:
> Gabriel Becker writes:
>
>  AFAIK R does not automatically copy function arguments. R actually tries
>  very hard to avoid copying while maintaining "pass by value" functionality.
>
>  ... R only copies data when you modify an object, not
>  when you simply pass it to a function.
>
> This is a bit misleading.  R tries to avoid copying by maintaining a
> count of how many references there are to an object, so that x[i] <- 9
> can be done without a copy if x is the only reference to the vector.
> However, it never decrements such counts.  As a result, simply passing
> x to a function that accesses but does not change it will result in x
> being copied if x[i] is changed after that function returns.  An
> exception is that this usually isn't the case if x is passed to a
> primitive function.  But note that not all standard functions are
> technically "primitive".
>
> The end result is that it's rather difficult to tell when copying will
> be done.  Try the following test, for example:
>
>  cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } ))
>  cat("b: "); print(system.time( { A[1,1]<-7; 0 } ))
>  cat("c: "); print(system.time( { B <- sqrt(A); 0 } ))
>  cat("d: "); print(system.time( { A[1,1]<-7; 0 } ))
>  cat("e: "); print(system.time( { B <- t(A); 0 } ))
>  cat("f: "); print(system.time( { A[1,1]<-7; 0 } ))
>  cat("g: "); print(system.time( { A[1,1]<-7; 0 } ))
>
> You'll find that the time printed after b:, d:, and g: is near zero,
> but that there is non-negligible time for f:.  This is because sqrt
> is primitive but t is not, so the modification to A after the call
> t(A) requires that a copy be made.
>
>   Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>