[Rd] Modifying a list: what gets copied?
Hadley Wickham
hadley at rice.edu
Thu Jul 12 22:54:39 CEST 2012
Hi all,
In my continued effort to understand when and what R copies, I've
designed a small experiment to try and figure out what goes on when a
list gets copied - is it a shallow copy or a deep copy. I believe the
following experiment isolates the difference:
options(digits = 2)
powers <- 4:6
n <- setNames(10 ^ powers, paste0("e", powers))
xs <- lapply(n, seq_len)
zs <- lapply(xs, list)
vector_u <- function(x) x[length(x) + 1L] <- 1L
list_u <- function(x) z$a <- 1L
library(microbenchmark)
microbenchmark(
vector_u(xs$e4),
vector_u(xs$e5),
vector_u(xs$e6),
list_u(zs$e4),
list_u(zs$e5),
list_u(zs$e6)
)
On my computer, I get
Unit: microseconds
expr min lq median uq max
1 list_u(zs$e4) 122.0 147.8 283.0 456.5 814
2 list_u(zs$e5) 118.1 158.0 287.6 453.3 12257
3 list_u(zs$e6) 119.8 153.3 299.8 458.8 13701
4 vector_u(xs$e4) 33.5 40.5 52.2 65.4 101
5 vector_u(xs$e5) 253.9 407.7 584.6 955.6 8347
6 vector_u(xs$e6) 4579.5 5511.5 7923.6 9956.1 109075
So the time it takes to modify that vector grows with the size of the
vector (approximately linearly), while the time to modify the list is
constant. This suggests that the list doesn't make a deep copy, but
the overhead for modifying a list is considerably higher than
modifying a list (roughly equivalent to copying a integer vector of
50,000 elements)
To make sure it wasn't just the way I was modifying the list, I tried
a few alternatives:
list_u2 <- function(x) z[[2]] <- 1L
list_u3 <- function(x) z[["a"]] <- 1L
list_u4 <- function(x) z[1] <- list(1L)
microbenchmark(
list_u(zs$e4),
list_u2(zs$e4),
list_u3(zs$e4),
list_u4(zs$e4)
)
Unit: microseconds
expr min lq median uq max
1 list_u(zs$e4) 125 430 447 471 677
2 list_u2(zs$e4) 120 424 447 466 14182
3 list_u3(zs$e4) 118 426 443 459 14075
4 list_u4(zs$e4) 123 435 457 482 13937
i.e. they're basically equivalent.
Are lists really that slow to modify? Another experiment (not shown)
implies that the time to modify a list is independent of the number of
elements in that list (at least between 1 and 1e6).
What have I missed?
Any feedback would be much appreciated. Thanks!
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
More information about the R-devel
mailing list