[R] memory use of copies

Ross Boylan ross at biostat.ucsf.edu
Fri Jan 24 02:53:13 CET 2014


[Apologies if a duplicate; we are having mail problems.]

I am trying to understand the circumstances under which R makes a copy
of an object, as opposed to simply referring to it.  I'm talking about
what goes on under the hood, not the user semantics.  I'm doing things
that take a lot of memory, and am trying to minimize my use.

I thought that R was clever so that copies were created lazily.  For
example, if a is matrix, then
b <- a
b & a referred to to the same object underneath, so that a complete
duplicate (deep copy) wasn't made until it was necessary, e.g.,
b[3, 1] <- 4
would duplicate the contents of a to b, and then overwrite them.

The following log, from R 3.0.1, does not seem to act that way; I get
the same amount of memory used whether I copy the same object repeatedly
or create new objects of the same size.

Can anyone explain what is going on?  Am I just wrong that copies are
initially shallow?  Or perhaps that behavior only applies for function
arguments?  Or doesn't apply for class slots or reference class
variables?

  > foo <- setRefClass("foo", fields=list(x="ANY"))
  > bar <- setClass("bar", slots=c("x"))
  > mycoef <- list(a=matrix(rnorm(200000), ncol=2000), b=array(rnorm(200000), dim=c(4, 5, 10000)))
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2650747  141.6    4170209   222.8    4170209   222.8
  Vcells 799751724 6101.7 1711485496 13057.6 1711485493 13057.6
  > a <- lapply(1:100, function(i) bar(x=mycoef))   # create 100 objects that contain copies
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2652156  141.7    4170209   222.8    4170209   222.8
  Vcells 839752640 6406.9 1711485496 13057.6 1711485493 13057.6
# +305 Mb
  > b <- lapply(1:100, function(i) foo(x=mycoef))   # same with a reference class
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2654761  141.8    4170209   222.8    4170209   222.8
  Vcells 879756752 6712.1 1711485496 13057.6 1711485493 13057.6
# also + 305 Mb
  > rm("a", "b")
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2650660  141.6    4170209   222.8    4170209   222.8
  Vcells 799751664 6101.7 1711485496 13057.6 1711485493 13057.6
# write to "copy" to see if it uses more memory
  > a <- lapply(1:100, function(i) {r <- bar(x=mycoef); r at x$a[5, 10] <- 33; r} )
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2652174  141.7    4170209   222.8    4170209   222.8
  Vcells 839752684 6406.9 1711485496 13057.6 1711485493 13057.6
# also + 305 Mb
  > rm("a", "b")
  Warning message:
  In rm("a", "b") : object 'b' not found
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2650680  141.6    4170209   222.8    4170209   222.8
  Vcells 799751684 6101.7 1711485496 13057.6 1711485493 13057.6
# now create completely distinct objects
  > a <- lapply(1:100, function(i) {acoef <- list(a=matrix(rnorm(200000), ncol=2000), b=array(rnorm(200000), dim=c(4, 5, 10000)))
!+                                 bar(x=acoef)})
  > gc()
              used   (Mb) gc trigger    (Mb)   max used    (Mb)
  Ncells   2652191  141.7    4170209   222.8    4170209   222.8
  Vcells 839752699 6406.9 1711485496 13057.6 1711485493 13057.6
# + 305 Mb

Thanks.
Ross Boylan

P.S. I also tried posting this from a google-managed email account, and 
have got back two messages like this:
Mail Delivery Subsystem mailer-daemon at googlemail.com

	
5:22 PM (28 minutes ago)
	
	
to me

This is an automatically generated Delivery Status Notification

THIS IS A WARNING MESSAGE ONLY.

YOU DO NOT NEED TO RESEND YOUR MESSAGE.

Delivery to the following recipient has been delayed:

r-help at r.project.org <mailto:r-help at r.project.org>

Message will be retried for 1 more day(s)

Technical details of temporary failure:
The recipient server did not accept our requests to connect. Learn more 
at http://support.google.com/mail/bin/answer.py?answer=7720 
<http://support.google.com/mail/bin/answer.py?answer=7720>
[(0) r.project.org <http://r.project.org>
. [206.188.192.100]:25: Connection refused]




More information about the R-help mailing list