[Rd] Need for garbage collection after creating object
Henrik Bengtsson
hb at stat.berkeley.edu
Tue Feb 5 19:45:34 CET 2008
On Feb 5, 2008 10:12 AM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> On Feb 5, 2008 8:01 AM, Iago Mosqueira <iago.mosqueira at gmail.com> wrote:
> > Hello,
> >
> > After experiencing some difficulties with large arrays, I was surprised
> > to see the apparent need for class to gc() after creating fairly large
> > arrays. For example, calling
> >
> > a<-array(2, dim=c(10,10,10,10,10,100))
> >
> > makes the memory usage of a fresh session of R jump from 13.8 Mb to
> > 166.4 Mb. A call to gc() brought it down to 90.8 Mb,
> >
> > > gc()
> > used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells 132619 3.6 350000 9.4 350000 9.4
> > Vcells 10086440 77.0 21335887 162.8 20086792 153.3
> >
> > as expected by
> >
> > > object.size(a)
> >
> > [1] 80000136
>
> I think the reason for this is that array() has to "expand" the input
> data to the right length internally;
>
> data <- rep(data, length.out = vl)
>
> That is a so called "NAMED" object internally and when the following call to
>
> dim(data) <- dim
>
> occurs, the safest thing R can do is to create a copy. [Anyone,
> correct me if I'm wrong].
>
> If you expand the input data yourself, you won't see that extra copy, e.g.
>
> data <- 2
> dim <- c(10,10,10,10,10,100)
> data <- rep(data, length.out=prod(dim))
> a <- array(data, dim=dim)
My bad here; that does indeed create an extra copy; rep() is the
problem and you see that when you gc() after rep(). It seems to be
hard to allocate an array with values without creating an extra copy,
e.g.
dim <- c(10,10,10,10,10,100)
data <- numeric(prod(dim))
dim(data) <- dim
will not create an extra copy, but as soon as you try to set a value
it will happen, e.g.
data[1,2,3,4,5,6] <- 2
Again, I believe this has to do with the fact that R is taking the
safest path possible and not risking overwriting an existing object in
memory (R is copy by value). Note that when you do a second
assignment, that "safety copy" is already created so no more copies
will be created, e.g. calling
data[1,2,3,4,5,7] <- 3
after the above will not create an extra copy.
/Henrik
>
> >
> > Do I need to call gc() after creating every large array, or can I setup
> > the system to do this more often or efficiently?
>
> The R garbage collector will free/deallocate that memory when
> "needed". However, calling gc() explicitly should minimize the risk
> for over-fragmented memory. Basically, if there are several blocks of
> garbage memory hanging around, you might end up with a situation where
> you a lot of *total* memory available, but you will only be able to
> allocate small chunks of memory at any time. Even calling gc() at
> that situation will not help; there is no mechanism that defragments
> memory in R. So calling gc() after large allocations will add some
> protection against that.
>
> /Henrik
>
>
> >
> > Thanks very much,
> >
> >
> > Iago
> >
> >
> > $platform
> > [1] "i686-pc-linux-gnu"
> > $version.string
> > [1] "R version 2.6.1 (2007-11-26)"
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
More information about the R-devel
mailing list