[R] Res: gc() and memory efficiency

Henrik Bengtsson hb at stat.berkeley.edu
Thu Feb 7 04:25:36 CET 2008


Open suggestion/question:

If you in each step of an K-step iteration load/allocate a large
object, each time of a different size, followed by smaller memory
allocations (due to your analysis), you might be better of if you
could do the iteration such that the largest object is in the first
iteration, the 2nd largest in the 2nd, and so on.

Example: If done in the incorrect order, you might end up with
fragmented memory allocations as follows:

Suboptimal:
1. Allocate: [40% object][10% misc][50% free] (blocks of memory image)
2. Free 'object': [40% free][10% misc][50% free]
3. Allocate '50% object': [40% free][10% misc][50% object]
4. Free 'object': [40% free][10% misc][50% free]
5. Allocate '60% object': Failure to allocate that amount of memory!

Optimal:
1. Allocate: [60% object][10% misc][30% free]
2. Free 'object': [60% free][10% misc][30% free]
3. Allocate '50% object': [50% object][10% free][10% misc][30% free]
4. Free 'object': [60% free][10% misc][30% free]
5. Allocate '40% object': [40% object][20% free][10% misc][30% free]
6. Free 'object': [60% free][10% misc][30% free]

/Henrik

On Feb 6, 2008 5:35 PM, Milton Cezar Ribeiro <milton_ruser at yahoo.com.br> wrote:
> Dear Harold,
>
> I had the same problem some times ago. I noticed that after I run a set commands (cleaning all non-usefull variables) for 5 times, the system broken-down. I solved it building several scritpsNN.R and call them in a .BAT DOS file. It worked so fine, almost in my case, and the computer runned for several days without stop :-)
>
> If you need more info on this solutions, fill free to write me again.
>
> By other side, if you find a better solutions (I also unfortunatelly run windows), share with us.
>
> Kind regards
>
> Miltinho
> Brazil
>
>
>
>
> ----- Mensagem original ----
> De: Prof Brian Ripley <ripley at stats.ox.ac.uk>
> Para: "Doran, Harold" <HDoran at air.org>
> Cc: r-help at r-project.org
> Enviadas: Terça-feira, 5 de Fevereiro de 2008 3:06:51
> Assunto: Re: [R] gc() and memory efficiency
>
>
> 1) See ?"Memory-limits": it is almost certainly memory fragmentation.
> You don't need to give the memory back to the OS (and few OSes actually do
> so).
>
> 2) I've never seen this running a 64-bit version of R.
>
> 3) You can easily write a script to do this.  Indeed, you could write an R
> script to run multiple R scripts in separate processes in turn (via
> system("Rscript fileN.R") ).  For example. Uwe Ligges uses R to script
> building and testing of packages on Windows.
>
> On Mon, 4 Feb 2008, Doran, Harold wrote:
>
> > I have a program which reads in a very large data set, performs some
> > analyses, and then repeats this process with another data set. As soon
> > as the first set of analyses are complete, I remove the very large
> > object and clean up to try and make memory available in order to run the
> > second set of analyses. The process looks something like this:
> >
> > 1) read in data set 1 and perform analyses
> > rm(list=ls())
> > gc()
> > 2) read in data set 2 and perform analyses
> > rm(list=ls())
> > gc()
> > ...
> >
> > But, it appears that I am not making the memory that was consumed in
> > step 1 available back to the OS as R complains that it cannot allocate a
> > vector of size X as the process tries to repeat in step 2.
> >
> > So, I close and reopen R and then drop in the code to run the second
> > analysis. When this is done, I close and reopen R and run the third
> > analysis.
> >
> > This is terribly inefficient. Instead I would rather just source in the
> > R code and let the analyses run over night.
> >
> > Is there a way that I can use gc() or some other function more
> > efficiently rather than having to close and reopen R at each iteration?
> >
> > I'm using Windows XP and r 2.6.1
> >
> > Harold
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,            Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                    +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>  para armazenamento!
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list