[Rd] Best practices for writing R functions (really copying)

Radford Neal radford at cs.toronto.edu
Mon Jul 25 17:53:23 CEST 2011


Gabriel Becker writes:

  AFAIK R does not automatically copy function arguments. R actually tries
  very hard to avoid copying while maintaining "pass by value" functionality.

  ... R only copies data when you modify an object, not
  when you simply pass it to a function.

This is a bit misleading.  R tries to avoid copying by maintaining a
count of how many references there are to an object, so that x[i] <- 9
can be done without a copy if x is the only reference to the vector.
However, it never decrements such counts.  As a result, simply passing
x to a function that accesses but does not change it will result in x
being copied if x[i] is changed after that function returns.  An
exception is that this usually isn't the case if x is passed to a
primitive function.  But note that not all standard functions are 
technically "primitive".

The end result is that it's rather difficult to tell when copying will
be done.  Try the following test, for example:

  cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } ))
  cat("b: "); print(system.time( { A[1,1]<-7; 0 } ))
  cat("c: "); print(system.time( { B <- sqrt(A); 0 } ))
  cat("d: "); print(system.time( { A[1,1]<-7; 0 } ))
  cat("e: "); print(system.time( { B <- t(A); 0 } ))
  cat("f: "); print(system.time( { A[1,1]<-7; 0 } ))
  cat("g: "); print(system.time( { A[1,1]<-7; 0 } ))

You'll find that the time printed after b:, d:, and g: is near zero,
but that there is non-negligible time for f:.  This is because sqrt
is primitive but t is not, so the modification to A after the call
t(A) requires that a copy be made.

   Radford Neal



More information about the R-devel mailing list