[Rd] Why does duplicate() make deep copies?

Simon Urbanek simon.urbanek at r-project.org
Thu Sep 5 09:48:15 CEST 2013


On Sep 5, 2013, at 12:31 AM, Peter Meilstrup wrote:

> Some experimentation with the below function should convince you that the
> runtime of the bit inside sys.time is proportional to size*number*times. I
> think it should only be proportional to number*times. The function is only
> manipulating a list of references to vectors and not trying to make changes
> to the vectors themselves.
> 
> overcopying <- function(size, number, times) {
>  #Make a list of NUMBER vectors of SIZE, then reorder the list.  The
>  #vectors themselves are never touched, only their references are
>  #moved around.  If R implements "copy on write" correctly the
>  #elapsed time should be ~number*times.
>  L <- replicate(number, list(vector("numeric", size)), simplify=FALSE)
>  system.time(for (i in 1:times) {
>    L[sample(number)] <- L
>  })
> }
> 
> I see that duplicate.c makes a recursive copy of each element when it
> encounters a VECSXP or a LISTSXP, which it seems there should be no need
> for (it should be sufficient to make a shallow copy and ensure NAMED is set
> on the elements.)
> 
> Why is R making apparently unnecessary deep copies?
> 

Because ref counting is not recursive. Changing that breaks a quite a few packages that abuse the fact that copies are currently deep. Main problem with that is that it's pretty much impossible to detect it ahead of time. That said, there are efforts underway by Luke Tierney and Michael Lawrence to see how far we can go without breaking everything.

Cheers,
Simon



More information about the R-devel mailing list