[Rd] R 3.0.0 memory use

Martin Morgan mtmorgan at fhcrc.org
Mon Apr 15 04:15:45 CEST 2013


On 04/14/2013 07:11 PM, luke-tierney at uiowa.edu wrote:
> There were a couple of bug fixes to somewhat obscure compound
> assignment related bugs that required bumping up internal reference
> counts. It's possible that one or more of these are responsible. If so
> it is unavoidable for now, but it's worth finding out for sure. With
> some stripped down test examples it should be possible to identify
> when things changed. I won't have time to look for some time, but if
> someone else wanted to nail this down that would be useful.

I can't quite tell from Tim's script what he's documenting. In R-2.15.3 I have

 > Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE)
character(0)

(or sometimes [1] "new page:new page:\"Rprofmem\" ")

whereas in R-3.0.0

 > Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE)
[1] "320040 :80040 :240048 :320040 :80040 :240048 :"

I think these are the allocations Tim is seeing. They're from the parser (see 
below) rather than as.data.frame. For Tim's example

   y <- 1:10^4 + 0.0
   Rprofmem(); d <- as.data.frame(y); Rprofmem(NULL); readLines("Rprofmem.out")

[1] "320040 :80040 :240048 :320040 :80040 :240048 :80040 
:\"as.data.frame.numeric\" \"as.data.frame\" "
[2] "320040 :80040 :240048 :320040 :80040 :240048 :"

only the allocation 80040 is from as.data.frame (from the call stack output).

Under R -d gdb

   (gdb) b R_OutputStackTrace
   (gdb) r
   > Rprofmem(); Rprofmem(NULL)

   Breakpoint 1, R_OutputStackTrace (file=0xbd43f0) at 
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434
   3434	{
   (gdb) bt
   #0  R_OutputStackTrace (file=0xbd43f0) at 
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434
   #1  0x00007ffff792ff83 in R_ReportAllocation (size=320040) at 
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3456
   #2  Rf_allocVector (type=13, length=80000) at 
/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:2478
   #3  0x00007ffff790bedf in growData () at gram.y:3391

and the memory allocations are from these lines in the parser gram.y

	PROTECT( bigger = allocVector( INTSXP, data_size * DATA_ROWS ) ) ;
	PROTECT( biggertext = allocVector( STRSXP, data_size ) );

I'm not sure why these show up under R 3.0.0, though.

$ R-2-15-branch/bin/R --version
R version 2.15.3 Patched (2013-03-13 r62579) -- "Security Blanket"
Copyright (C) 2013 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R-3-0-branch$ bin/R --version
R version 3.0.0 Patched (2013-04-14 r62579) -- "Masked Marvel"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

Martin



>
> Best,
>
> luke
>
> On Sun, 14 Apr 2013, Tim Hesterberg wrote:
>
>> I did some benchmarking of data frame code, and
>> it appears that R 3.0.0 is far worse than earlier versions of R
>> in terms of how many large objects it allocates space for,
>> for data frame operations - creation, subscripting, subscript replacement.
>> For a data frame with n rows, it makes either 2 or 4 extra copies of
>> all of:
>>        8n bytes (e.g. double precision)
>>        24n bytes
>>        32n bytes
>> E.g., for as.data.frame(numeric vector), instead of allocations
>> totalling ~8n bytes, it allocates 33 times that much.
>>
>> Here, compare columns 3 and 5
>> (columns 2 and 4 are with the dataframe package).
>>
>> # Summary
>> #                               R-2.14.2        R-2.15.3        R-3.0.0
>> #                               w/o     with    w/o     with    w/o
>> #       as.data.frame(y)        3       1       1       1       5;4;4
>> #       data.frame(y)           7       3       4       2       6;2;2
>> #       data.frame(y, z)        7 each  3 each  4       2       8;4;4
>> #       as.data.frame(l)        8       3       5       2       9;4;4
>> #       data.frame(l)           13      5       8       3       12;4;4
>> #       d$z <- z                3,2     1,1     3,1     2,1     7;4;4,1
>> #       d[["z"]] <- z           4,3     1,1     3,1     2,1     7;4;4,1
>> #       d[, "z"] <- z           6,4,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
>> #       d["z"] <- z             6,5,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
>> #       d["z"] <- list(z=z)     6,3,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
>> #       d["z"] <- Z #list(z=z)  6,2,2   2,1,1   4,1,2   3,1,1   8;4;4,1,2
>> #       a <- d["y"]             2       1       2       1       6;4;4
>> #       a <- d[, "y", drop=F]   2       1       2       1       6;4;4
>>
>> # Where two numbers are given, they refer to:
>> #   (copies of the old data frame),
>> #   (copies of the new column)
>> # A third number refers to numbers of
>> #   (copies made of an integer vector of row names)
>>
>> # For R 3.0.0, I'm getting astounding results - many more copies,
>> # and also some copies of larger objects; in addition to the data
>> # vectors of size 80K and 160K, also 240K and 320K.
>> # Where three numbers are given in form a;c;d, they refer to
>> #   (copies of 80K; 240K; 320K)
>>
>> The benchmarks are at
>> http://www.timhesterberg.net/r-packages/memory.R
>>
>> I'm using versions of R I installed from source on a Linux box, using e.g.
>> ./configure --prefix=(my path) --enable-memory-profiling --with-readline=no
>> make
>> make install
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-devel mailing list