[R] Preparing for multi-core CPUs and parallel processing applications
Martin Morgan
mtmorgan at fhcrc.org
Fri Jul 31 16:00:19 CEST 2009
Hi Steve --
Steve_Friedman at nps.gov wrote:
> Hello
>
> I am fortunate (or in really big trouble) in that the research group I work
> with will soon be receiving several high end dual quad core machines. We
> will use the Ubuntu OS on these. We intend to use this cluster for some
> extensive modeling applications. Our programming guru has demonstrated the
> ability to link much simpler machines to share CPUs and we purchased the
> new ones to take advantage of this option. We have also begun exploration
> of the R CUDA and J CUDA functionality to push the processes to the
> graphics CPU which greatly speeds up the numerical processing.
>
> My question(s) to this group:
Last question first, the R-sig-hpc group might be more appropriate for
an extended discussion.
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
see also the HighPerformanceComputing task view
http://cran.fhcrc.org/web/views/HighPerformanceComputing.html
> 1) Which packages are suitable for parallel processing applications in R
> ?
> 2) Are these packages ready for prime time applications or are they
> developmental at this time?
I use Rmpi for all my parallel computing, but if I had more time I'd
explore multicore for more efficient use of several CPU on a single
machine, and the new offerings from Revolution computing. If there were
significant portions of C code I'd look into using openMP (as done in
the pnmath library). Also using a parallel BLAS / LAPACK library if that
was where significant computation was occurring.
> 3) Are we better off working in Java or C++ for the majority of this
> simulation work and linking to R for statistical analysis?
> 4) What are the pit falls, if any, that I need to be aware of ?
With multiple core, it's important to remember that large memory is
divided amongst cpu, so that huge-sounding 32GB 8 core machine has
'only' 4 GB / cpu when independent R processes are allocated to each cpu
(as is the style with Rmpi).
> 5) Can we take advantage of sharing the graphics CPU, via R CUDA, in a
> parallel distributed shared cluster of dedicated machines ?
>
> 6) Our statistical analysis and modeling applications address very large
> geographic issues. We generally work with 30-40 year daily time step data
> in a grided format. The grid is approximate 250 x 400 cells in extent, each
> representing approximately 500 meters x 500 meters. To this we a very
> large suite of ancillary information, both spatial and non-spatial, to
> simulate a variety of ecological state conditions. My question is - is
> this too large for R , given its use of memory?
Depending on the application, large data sets can often be managed
effectively on disk, e.g., by using the ncdf package (for large numeric
data) or a data base (R includes sqlite, for instance), and analyzing
independent 'slices'. This fits well with common parallel computing
paradigms.
>
> 7) I currently have a laptop with Ubuntu with R Version 2.6.2
> (2008-02-08). What is the most recent R version for Ubuntu and what is the
> installation procedure ?
>
> These are just the initial questions that I'm sure to have. If these are
> being directed to the wrong help pages, I'm sorry to have taken your time.
> If you would be so kind as to direct me to the more appropriate help site
> I'd appreciate your assistance.
>
> Thanks in advance,
> Steve
>
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> Steve_Friedman at nps.gov
> Office (305) 224 - 4282
> Fax (305) 224 - 4147
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list