[Rd] Best practices for writing R functions (really copying)
Henrik Bengtsson
hb at biostat.ucsf.edu
Mon Jul 25 20:11:59 CEST 2011
Use tracemem() instead, i.e.
> A <- matrix(c(1.0,1.1), nrow=5, ncol=10);
> tracemem(A);
[1] "<0x00000000047ab170"
> A[1,1] <- 7;
> B <- sqrt(A);
tracemem[0x00000000047ab170 -> 0x000000000552f338]:
> A[1,1] <- 7;
> B <- t(A);
> A[1,1] <- 7;
tracemem[0x00000000047ab170 -> 0x00000000057ba588]:
> A[1,1] <- 7;
> A[1,1] <- 7;
It looks like sqrt() creates the copy internally, which explains the difference.
However, it is true that even if a new copy is not needed/created
inside a function call, a function "touching" the object would trigger
downstream copies, e.g.
# Not touching the object:
> foo <- function(X) { 0 }
> B <- foo(A);
> A[1,1] <- 7;
> A[1,1] <- 7;
# Touching the object:
> bar <- function(X) { Y <- X; 0 }
> B <- bar(A);
> A[1,1] <- 7;
tracemem[0x00000000039b5538 -> 0x000000000402c448]:
> A[1,1] <- 7;
However however, try doing the same with a vector instead of matrix,
e.g. A <- 1:10, and/or assignment with A[1] <- 7 and you get a
different behavior. The source code should explain why. I leave it
at this.
My $.02
/Henrik
On Mon, Jul 25, 2011 at 8:53 AM, Radford Neal <radford at cs.toronto.edu> wrote:
> Gabriel Becker writes:
>
> AFAIK R does not automatically copy function arguments. R actually tries
> very hard to avoid copying while maintaining "pass by value" functionality.
>
> ... R only copies data when you modify an object, not
> when you simply pass it to a function.
>
> This is a bit misleading. R tries to avoid copying by maintaining a
> count of how many references there are to an object, so that x[i] <- 9
> can be done without a copy if x is the only reference to the vector.
> However, it never decrements such counts. As a result, simply passing
> x to a function that accesses but does not change it will result in x
> being copied if x[i] is changed after that function returns. An
> exception is that this usually isn't the case if x is passed to a
> primitive function. But note that not all standard functions are
> technically "primitive".
>
> The end result is that it's rather difficult to tell when copying will
> be done. Try the following test, for example:
>
> cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } ))
> cat("b: "); print(system.time( { A[1,1]<-7; 0 } ))
> cat("c: "); print(system.time( { B <- sqrt(A); 0 } ))
> cat("d: "); print(system.time( { A[1,1]<-7; 0 } ))
> cat("e: "); print(system.time( { B <- t(A); 0 } ))
> cat("f: "); print(system.time( { A[1,1]<-7; 0 } ))
> cat("g: "); print(system.time( { A[1,1]<-7; 0 } ))
>
> You'll find that the time printed after b:, d:, and g: is near zero,
> but that there is non-negligible time for f:. This is because sqrt
> is primitive but t is not, so the modification to A after the call
> t(A) requires that a copy be made.
>
> Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list