[R] Resources for utilizing multiple processors

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jun 9 08:56:53 CEST 2011


On Wed, 8 Jun 2011, Robin Jeffries wrote:

> Hello,
>
> I know of some various methods out there to utilize multiple processors but
> am not sure what the best solution would be. First some things to note:
> I'm running dependent simulations, so direct parallel coding is out
> (multicore, doSnow, etc).
> I'm on Windows, and don't know C. I don't plan on learning C or any of the
> *nix languages.

By restricting yourself to one of the least capable OS R runs on, you 
are making this harder for yourself.

> My main concern deals with Multiple analyses on large data sets. By large I
> mean that when I'm done running 2 simulations R is using ~3G of RAM, the
> remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
> to compare the two resulting samples, grinding the process to a halt. I'd
> like to have separate cores simultaneously run each analysis. That will save
> on time and I'll have to ponder the BGR calculation problem another way. Can
> R temporarily use HD space to write calculations to instead of RAM?

By using virtual memory (R does not in fact use RAM, it always uses 
virtual memory).  With a 64bit R you can use up to terabytes of VM. 
Because Windows' disc access is so slow, you will need to set a 
max-memory-size larger than your RAM size to enable this.

> The second concern boils down to whether or not there is a way to split up
> dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
> to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
> b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2,

As stated, that is pointless.  The core running FUN2 would be waiting 
for the resuls of FUN1.  However, at time t FUN1 could generate 
a(t+1) from a(t-1) whilst FUN2 generates b(t) and c(t).

> and better yet, a third to run all the pre-and post- processing tidbits!

Look into package snow (with socket clusters).  The overhead of what 
you ask may be too high (POSIX OSes can use package multicore, which 
has a much lower overhead), but if the calculations are slow enough it 
may be worthwhile.  There are Windows-oriented examples in package 
RSiena.

>
>
> So if anyone has any suggestions as to a direction I can look into, it would
> be appreciated.
>
>
> Robin Jeffries
> MS, DrPH Candidate
> Department of Biostatistics
> UCLA
> 530-633-STAT(7828)
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list