[R] memory use of copies
Ross Boylan
ross at biostat.ucsf.edu
Fri Jan 24 02:53:13 CET 2014
[Apologies if a duplicate; we are having mail problems.]
I am trying to understand the circumstances under which R makes a copy
of an object, as opposed to simply referring to it. I'm talking about
what goes on under the hood, not the user semantics. I'm doing things
that take a lot of memory, and am trying to minimize my use.
I thought that R was clever so that copies were created lazily. For
example, if a is matrix, then
b <- a
b & a referred to to the same object underneath, so that a complete
duplicate (deep copy) wasn't made until it was necessary, e.g.,
b[3, 1] <- 4
would duplicate the contents of a to b, and then overwrite them.
The following log, from R 3.0.1, does not seem to act that way; I get
the same amount of memory used whether I copy the same object repeatedly
or create new objects of the same size.
Can anyone explain what is going on? Am I just wrong that copies are
initially shallow? Or perhaps that behavior only applies for function
arguments? Or doesn't apply for class slots or reference class
variables?
> foo <- setRefClass("foo", fields=list(x="ANY"))
> bar <- setClass("bar", slots=c("x"))
> mycoef <- list(a=matrix(rnorm(200000), ncol=2000), b=array(rnorm(200000), dim=c(4, 5, 10000)))
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2650747 141.6 4170209 222.8 4170209 222.8
Vcells 799751724 6101.7 1711485496 13057.6 1711485493 13057.6
> a <- lapply(1:100, function(i) bar(x=mycoef)) # create 100 objects that contain copies
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2652156 141.7 4170209 222.8 4170209 222.8
Vcells 839752640 6406.9 1711485496 13057.6 1711485493 13057.6
# +305 Mb
> b <- lapply(1:100, function(i) foo(x=mycoef)) # same with a reference class
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2654761 141.8 4170209 222.8 4170209 222.8
Vcells 879756752 6712.1 1711485496 13057.6 1711485493 13057.6
# also + 305 Mb
> rm("a", "b")
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2650660 141.6 4170209 222.8 4170209 222.8
Vcells 799751664 6101.7 1711485496 13057.6 1711485493 13057.6
# write to "copy" to see if it uses more memory
> a <- lapply(1:100, function(i) {r <- bar(x=mycoef); r at x$a[5, 10] <- 33; r} )
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2652174 141.7 4170209 222.8 4170209 222.8
Vcells 839752684 6406.9 1711485496 13057.6 1711485493 13057.6
# also + 305 Mb
> rm("a", "b")
Warning message:
In rm("a", "b") : object 'b' not found
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2650680 141.6 4170209 222.8 4170209 222.8
Vcells 799751684 6101.7 1711485496 13057.6 1711485493 13057.6
# now create completely distinct objects
> a <- lapply(1:100, function(i) {acoef <- list(a=matrix(rnorm(200000), ncol=2000), b=array(rnorm(200000), dim=c(4, 5, 10000)))
!+ bar(x=acoef)})
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2652191 141.7 4170209 222.8 4170209 222.8
Vcells 839752699 6406.9 1711485496 13057.6 1711485493 13057.6
# + 305 Mb
Thanks.
Ross Boylan
P.S. I also tried posting this from a google-managed email account, and
have got back two messages like this:
Mail Delivery Subsystem mailer-daemon at googlemail.com
5:22 PM (28 minutes ago)
to me
This is an automatically generated Delivery Status Notification
THIS IS A WARNING MESSAGE ONLY.
YOU DO NOT NEED TO RESEND YOUR MESSAGE.
Delivery to the following recipient has been delayed:
r-help at r.project.org <mailto:r-help at r.project.org>
Message will be retried for 1 more day(s)
Technical details of temporary failure:
The recipient server did not accept our requests to connect. Learn more
at http://support.google.com/mail/bin/answer.py?answer=7720
<http://support.google.com/mail/bin/answer.py?answer=7720>
[(0) r.project.org <http://r.project.org>
. [206.188.192.100]:25: Connection refused]
More information about the R-help
mailing list