[Rd] Modifying a list: what gets copied?

Thu Jul 12 22:54:39 CEST 2012

Hi all,

In my continued effort to understand when and what R copies, I've
designed a small experiment to try and figure out what goes on when a
list gets copied - is it a shallow copy or a deep copy. I believe the
following experiment isolates the difference:

options(digits = 2)
powers <- 4:6
n <- setNames(10 ^ powers, paste0("e", powers))
xs <- lapply(n, seq_len)
zs <- lapply(xs, list)

vector_u <- function(x) x[length(x) + 1L] <- 1L
list_u <- function(x) z$a <- 1L

library(microbenchmark)
microbenchmark(
  vector_u(xs$e4),
  vector_u(xs$e5),
  vector_u(xs$e6),
  list_u(zs$e4),
  list_u(zs$e5),
  list_u(zs$e6)
)

On my computer, I get

Unit: microseconds
             expr    min     lq median     uq    max
1   list_u(zs$e4)  122.0  147.8  283.0  456.5    814
2   list_u(zs$e5)  118.1  158.0  287.6  453.3  12257
3   list_u(zs$e6)  119.8  153.3  299.8  458.8  13701
4 vector_u(xs$e4)   33.5   40.5   52.2   65.4    101
5 vector_u(xs$e5)  253.9  407.7  584.6  955.6   8347
6 vector_u(xs$e6) 4579.5 5511.5 7923.6 9956.1 109075

So the time it takes to modify that vector grows with the size of the
vector (approximately linearly), while the time to modify the list is
constant.  This suggests that the list doesn't make a deep copy, but
the overhead for modifying a list is considerably higher than
modifying a list (roughly equivalent to copying a integer vector of
50,000 elements)

To make sure it wasn't just the way I was modifying the list, I tried
a few alternatives:

list_u2 <- function(x) z[[2]] <- 1L
list_u3 <- function(x) z[["a"]] <- 1L
list_u4 <- function(x) z[1] <- list(1L)

microbenchmark(
  list_u(zs$e4),
  list_u2(zs$e4),
  list_u3(zs$e4),
  list_u4(zs$e4)
)

Unit: microseconds
            expr min  lq median  uq   max
1  list_u(zs$e4) 125 430    447 471   677
2 list_u2(zs$e4) 120 424    447 466 14182
3 list_u3(zs$e4) 118 426    443 459 14075
4 list_u4(zs$e4) 123 435    457 482 13937

i.e. they're basically equivalent.

Are lists really that slow to modify?  Another experiment (not shown)
implies that the time to modify a list is independent of the number of
elements in that list (at least between 1 and 1e6).

What have I missed?

Any feedback would be much appreciated.  Thanks!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/