[Rd] R 3.0.0 memory use
Martin Morgan
mtmorgan at fhcrc.org
Mon Apr 15 04:15:45 CEST 2013
On 04/14/2013 07:11 PM, luke-tierney at uiowa.edu wrote:
> There were a couple of bug fixes to somewhat obscure compound
> assignment related bugs that required bumping up internal reference
> counts. It's possible that one or more of these are responsible. If so
> it is unavoidable for now, but it's worth finding out for sure. With
> some stripped down test examples it should be possible to identify
> when things changed. I won't have time to look for some time, but if
> someone else wanted to nail this down that would be useful.
I can't quite tell from Tim's script what he's documenting. In R-2.15.3 I have
> Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE)
character(0)
(or sometimes [1] "new page:new page:\"Rprofmem\" ")
whereas in R-3.0.0
> Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE)
[1] "320040 :80040 :240048 :320040 :80040 :240048 :"
I think these are the allocations Tim is seeing. They're from the parser (see
below) rather than as.data.frame. For Tim's example
y <- 1:10^4 + 0.0
Rprofmem(); d <- as.data.frame(y); Rprofmem(NULL); readLines("Rprofmem.out")
[1] "320040 :80040 :240048 :320040 :80040 :240048 :80040
:\"as.data.frame.numeric\" \"as.data.frame\" "
[2] "320040 :80040 :240048 :320040 :80040 :240048 :"
only the allocation 80040 is from as.data.frame (from the call stack output).
Under R -d gdb
(gdb) b R_OutputStackTrace
(gdb) r
> Rprofmem(); Rprofmem(NULL)
Breakpoint 1, R_OutputStackTrace (file=0xbd43f0) at
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434
3434 {
(gdb) bt
#0 R_OutputStackTrace (file=0xbd43f0) at
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434
#1 0x00007ffff792ff83 in R_ReportAllocation (size=320040) at
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3456
#2 Rf_allocVector (type=13, length=80000) at
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:2478
#3 0x00007ffff790bedf in growData () at gram.y:3391
and the memory allocations are from these lines in the parser gram.y
PROTECT( bigger = allocVector( INTSXP, data_size * DATA_ROWS ) ) ;
PROTECT( biggertext = allocVector( STRSXP, data_size ) );
I'm not sure why these show up under R 3.0.0, though.
$ R-2-15-branch/bin/R --version
R version 2.15.3 Patched (2013-03-13 r62579) -- "Security Blanket"
Copyright (C) 2013 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R-3-0-branch$ bin/R --version
R version 3.0.0 Patched (2013-04-14 r62579) -- "Masked Marvel"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
Martin
>
> Best,
>
> luke
>
> On Sun, 14 Apr 2013, Tim Hesterberg wrote:
>
>> I did some benchmarking of data frame code, and
>> it appears that R 3.0.0 is far worse than earlier versions of R
>> in terms of how many large objects it allocates space for,
>> for data frame operations - creation, subscripting, subscript replacement.
>> For a data frame with n rows, it makes either 2 or 4 extra copies of
>> all of:
>> 8n bytes (e.g. double precision)
>> 24n bytes
>> 32n bytes
>> E.g., for as.data.frame(numeric vector), instead of allocations
>> totalling ~8n bytes, it allocates 33 times that much.
>>
>> Here, compare columns 3 and 5
>> (columns 2 and 4 are with the dataframe package).
>>
>> # Summary
>> # R-2.14.2 R-2.15.3 R-3.0.0
>> # w/o with w/o with w/o
>> # as.data.frame(y) 3 1 1 1 5;4;4
>> # data.frame(y) 7 3 4 2 6;2;2
>> # data.frame(y, z) 7 each 3 each 4 2 8;4;4
>> # as.data.frame(l) 8 3 5 2 9;4;4
>> # data.frame(l) 13 5 8 3 12;4;4
>> # d$z <- z 3,2 1,1 3,1 2,1 7;4;4,1
>> # d[["z"]] <- z 4,3 1,1 3,1 2,1 7;4;4,1
>> # d[, "z"] <- z 6,4,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
>> # d["z"] <- z 6,5,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
>> # d["z"] <- list(z=z) 6,3,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
>> # d["z"] <- Z #list(z=z) 6,2,2 2,1,1 4,1,2 3,1,1 8;4;4,1,2
>> # a <- d["y"] 2 1 2 1 6;4;4
>> # a <- d[, "y", drop=F] 2 1 2 1 6;4;4
>>
>> # Where two numbers are given, they refer to:
>> # (copies of the old data frame),
>> # (copies of the new column)
>> # A third number refers to numbers of
>> # (copies made of an integer vector of row names)
>>
>> # For R 3.0.0, I'm getting astounding results - many more copies,
>> # and also some copies of larger objects; in addition to the data
>> # vectors of size 80K and 160K, also 240K and 320K.
>> # Where three numbers are given in form a;c;d, they refer to
>> # (copies of 80K; 240K; 320K)
>>
>> The benchmarks are at
>> http://www.timhesterberg.net/r-packages/memory.R
>>
>> I'm using versions of R I installed from source on a Linux box, using e.g.
>> ./configure --prefix=(my path) --enable-memory-profiling --with-readline=no
>> make
>> make install
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the R-devel
mailing list