[Rd] R 3.0.0 memory use
Tim Hesterberg
timhesterberg at gmail.com
Mon Apr 15 00:22:00 CEST 2013
I did some benchmarking of data frame code, and
it appears that R 3.0.0 is far worse than earlier versions of R
in terms of how many large objects it allocates space for,
for data frame operations - creation, subscripting, subscript replacement.
For a data frame with n rows, it makes either 2 or 4 extra copies of
all of:
8n bytes (e.g. double precision)
24n bytes
32n bytes
E.g., for as.data.frame(numeric vector), instead of allocations
totalling ~8n bytes, it allocates 33 times that much.
Here, compare columns 3 and 5
(columns 2 and 4 are with the dataframe package).
# Summary
# R-2.14.2 R-2.15.3 R-3.0.0
# w/o with w/o with w/o
# as.data.frame(y) 3 1 1 1 5;4;4
# data.frame(y) 7 3 4 2 6;2;2
# data.frame(y, z) 7 each 3 each 4 2 8;4;4
# as.data.frame(l) 8 3 5 2 9;4;4
# data.frame(l) 13 5 8 3 12;4;4
# d$z <- z 3,2 1,1 3,1 2,1 7;4;4,1
# d[["z"]] <- z 4,3 1,1 3,1 2,1 7;4;4,1
# d[, "z"] <- z 6,4,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
# d["z"] <- z 6,5,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
# d["z"] <- list(z=z) 6,3,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
# d["z"] <- Z #list(z=z) 6,2,2 2,1,1 4,1,2 3,1,1 8;4;4,1,2
# a <- d["y"] 2 1 2 1 6;4;4
# a <- d[, "y", drop=F] 2 1 2 1 6;4;4
# Where two numbers are given, they refer to:
# (copies of the old data frame),
# (copies of the new column)
# A third number refers to numbers of
# (copies made of an integer vector of row names)
# For R 3.0.0, I'm getting astounding results - many more copies,
# and also some copies of larger objects; in addition to the data
# vectors of size 80K and 160K, also 240K and 320K.
# Where three numbers are given in form a;c;d, they refer to
# (copies of 80K; 240K; 320K)
The benchmarks are at
http://www.timhesterberg.net/r-packages/memory.R
I'm using versions of R I installed from source on a Linux box, using e.g.
./configure --prefix=(my path) --enable-memory-profiling --with-readline=no
make
make install
More information about the R-devel
mailing list