[Rd] R 3.0.0 memory use
Tim Hesterberg
timhesterberg at gmail.com
Mon Apr 15 06:07:05 CEST 2013
When I change the data set size, the "extra allocations" do
not change in size. This supports Luke and Martin's diagnosis.
The extra allocations are either 2 or 4 allocations each of size
80040
240048
320040
Details (you may skip):
(Fresh session of R 3.0.0)
> y <- 1:10^4 + 0.0
> Rprofmem("temp.out", threshold = 10^4)
> d <- as.data.frame(y)
> Rprofmem(NULL); system("cat temp.out")
320040 :80040 :240048 :320040 :80040 :240048 :80040 :"as.data.frame.numeric" "as.data.frame"
320040 :80040 :240048 :320040 :80040 :240048 :>
> # Try increasing size by a factor of 10
> y <- 1:10^5 + 0.0
> Rprofmem("temp.out", threshold = 10^4)
> d <- as.data.frame(y)
> Rprofmem(NULL); system("cat temp.out")
320040 :80040 :240048 :320040 :80040 :240048 :800040 :"as.data.frame.numeric" "as.data.frame"
320040 :80040 :240048 :320040 :80040 :240048 :>
The number of allocations shown, of different sizes:
3.0.0 3.0.0 2.15.3 2.15.3
first second first second
240048 4 4 0 0
320040 4 4 0 0
80040 5 4 1 0
800040 0 1 0 1
So it looks like both R 2.15.3 and R 3.0.0 are making
one copy of the data, plus extra allocations.
(Fresh session of R 2.15.3)
> y <- 1:10^4 + 0.0
> Rprofmem("temp.out", threshold = 10^4)
> d <- as.data.frame(y)
> Rprofmem(NULL); system("cat temp.out")
80040 :"as.data.frame.numeric" "as.data.frame"
> # Increase size by factor of 10
> y <- 1:10^5 + 0.0
> Rprofmem("temp.out", threshold = 10^4)
> d <- as.data.frame(y)
> Rprofmem(NULL); system("cat temp.out")
800040 :"as.data.frame.numeric" "as.data.frame"
On Sun, 14 Apr 2013 19:15:45 -0700 Martin Morgan <mtmorgan at fhcrc.org> wrote:
>On 04/14/2013 07:11 PM, luke-tierney at uiowa.edu wrote:
>> There were a couple of bug fixes to somewhat obscure compound
>> assignment related bugs that required bumping up internal reference
>> counts. It's possible that one or more of these are responsible. If so
>> it is unavoidable for now, but it's worth finding out for sure. With
>> some stripped down test examples it should be possible to identify
>> when things changed. I won't have time to look for some time, but if
>> someone else wanted to nail this down that would be useful.
>
>I can't quite tell from Tim's script what he's documenting. In R-2.15.3 I have
>
> > Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE)
>character(0)
>
>(or sometimes [1] "new page:new page:\"Rprofmem\" ")
>
>whereas in R-3.0.0
>
> > Rprofmem(); Rprofmem(NULL); readLines("Rprofmem.out", warn=FALSE)
>[1] "320040 :80040 :240048 :320040 :80040 :240048 :"
>
>I think these are the allocations Tim is seeing. They're from the parser (see
>below) rather than as.data.frame. For Tim's example
>
> y <- 1:10^4 + 0.0
> Rprofmem(); d <- as.data.frame(y); Rprofmem(NULL); readLines("Rprofmem.out")
>
>[1] "320040 :80040 :240048 :320040 :80040 :240048 :80040
>:\"as.data.frame.numeric\" \"as.data.frame\" "
>[2] "320040 :80040 :240048 :320040 :80040 :240048 :"
>
>only the allocation 80040 is from as.data.frame (from the call stack output).
>
>Under R -d gdb
>
> (gdb) b R_OutputStackTrace
> (gdb) r
> > Rprofmem(); Rprofmem(NULL)
>
> Breakpoint 1, R_OutputStackTrace (file=0xbd43f0) at
>/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434
> 3434 {
> (gdb) bt
> #0 R_OutputStackTrace (file=0xbd43f0) at
>/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3434
> #1 0x00007ffff792ff83 in R_ReportAllocation (size=320040) at
>/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:3456
> #2 Rf_allocVector (type=13, length=80000) at
>/home/mtmorgan/src/R-3-0-branch/src/main/memory.c:2478
> #3 0x00007ffff790bedf in growData () at gram.y:3391
>
>and the memory allocations are from these lines in the parser gram.y
>
> PROTECT( bigger = allocVector( INTSXP, data_size * DATA_ROWS ) ) ;
> PROTECT( biggertext = allocVector( STRSXP, data_size ) );
>
>I'm not sure why these show up under R 3.0.0, though.
>
>$ R-2-15-branch/bin/R --version
>R version 2.15.3 Patched (2013-03-13 r62579) -- "Security Blanket"
>Copyright (C) 2013 The R Foundation for Statistical Computing
>ISBN 3-900051-07-0
>Platform: x86_64-unknown-linux-gnu (64-bit)
>
>R-3-0-branch$ bin/R --version
>R version 3.0.0 Patched (2013-04-14 r62579) -- "Masked Marvel"
>Copyright (C) 2013 The R Foundation for Statistical Computing
>Platform: x86_64-unknown-linux-gnu (64-bit)
>
>Martin
>
>
>
>>
>> Best,
>>
>> luke
>>
>> On Sun, 14 Apr 2013, Tim Hesterberg wrote:
>>
>>> I did some benchmarking of data frame code, and
>>> it appears that R 3.0.0 is far worse than earlier versions of R
>>> in terms of how many large objects it allocates space for,
>>> for data frame operations - creation, subscripting, subscript replacement.
>>> For a data frame with n rows, it makes either 2 or 4 extra copies of
>>> all of:
>>> 8n bytes (e.g. double precision)
>>> 24n bytes
>>> 32n bytes
>>> E.g., for as.data.frame(numeric vector), instead of allocations
>>> totalling ~8n bytes, it allocates 33 times that much.
>>>
>>> Here, compare columns 3 and 5
>>> (columns 2 and 4 are with the dataframe package).
>>>
>>> # Summary
>>> # R-2.14.2 R-2.15.3 R-3.0.0
>>> # w/o with w/o with w/o
>>> # as.data.frame(y) 3 1 1 1 5;4;4
>>> # data.frame(y) 7 3 4 2 6;2;2
>>> # data.frame(y, z) 7 each 3 each 4 2 8;4;4
>>> # as.data.frame(l) 8 3 5 2 9;4;4
>>> # data.frame(l) 13 5 8 3 12;4;4
>>> # d$z <- z 3,2 1,1 3,1 2,1 7;4;4,1
>>> # d[["z"]] <- z 4,3 1,1 3,1 2,1 7;4;4,1
>>> # d[, "z"] <- z 6,4,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
>>> # d["z"] <- z 6,5,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
>>> # d["z"] <- list(z=z) 6,3,2 2,2,1 4,2,2 3,2,1 8;4;4,2,2
>>> # d["z"] <- Z #list(z=z) 6,2,2 2,1,1 4,1,2 3,1,1 8;4;4,1,2
>>> # a <- d["y"] 2 1 2 1 6;4;4
>>> # a <- d[, "y", drop=F] 2 1 2 1 6;4;4
>>>
>>> # Where two numbers are given, they refer to:
>>> # (copies of the old data frame),
>>> # (copies of the new column)
>>> # A third number refers to numbers of
>>> # (copies made of an integer vector of row names)
>>>
>>> # For R 3.0.0, I'm getting astounding results - many more copies,
>>> # and also some copies of larger objects; in addition to the data
>>> # vectors of size 80K and 160K, also 240K and 320K.
>>> # Where three numbers are given in form a;c;d, they refer to
>>> # (copies of 80K; 240K; 320K)
>>>
>>> The benchmarks are at
>>> http://www.timhesterberg.net/r-packages/memory.R
>>>
>>> I'm using versions of R I installed from source on a Linux box, using e.g.
>>> ./configure --prefix=(my path) --enable-memory-profiling --with-readline=no
>>> make
>>> make install
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>
>
>--
>Computational Biology / Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N.
>PO Box 19024 Seattle, WA 98109
>
>Location: Arnold Building M1 B861
>Phone: (206) 667-2793
More information about the R-devel
mailing list