[R-sig-hpc] Advice on HPC/R needs for advanced computing oversight committee

George Ostrouchov ostrouchovg at ornl.gov
Fri Jan 18 22:03:08 CET 2013


Dear David,

We have several R packages that target what is called "single program, 
multiple data" SPMD style of batch parallel programming on distributed 
machines (see r-pbd.org). SPMD is how most things are done on 
supercomputers for the last 20 years. While our target machines are 
large, the packages work on smaller machines down to multicore laptops.

There are examples in the packages already released and we are close to 
releasing a package, pbdDEMO, which is a more thorough introduction.

Also, let me mention that NSF's XSEDE resources are available to any 
researcher or educator at a US-based institution. The resources include 
clusters, scalable-parallel systems, and shared-memory systems with 
various CPU, memory, communications, and storage configurations. To get 
started, see https://www.xsede.org/using-xsede. Let me add that we use 
the Kraken and Nautilus resources to develop the pbdR set of packages, 
so these resources will have the most R support.

George

On 1/11/13 12:52 PM, David J. Vanness wrote:
> Dear SIG members,
>
>   
>
> I am a health economist at the University of Wisconsin Madison.  In part of
> my work, I use R and jags to conduct MCMC analysis of clinical and economic
> data for medical decision-making.  Over the years, I have slowly waded into
> high throughput techniques for conducting “pleasantly parallel” aspects of
> my work.  I have dreams of using HPC to enable more efficient Bayesian
> analysis of large health care claims datasets – but at this point they are
> just dreams.
>
>   
>
> Recently, I have been asked to join a new committee that is charged with
> oversight in developing a university-wide advanced computing shared resource
> (consisting of both hardware and professional consultation).  The committee
> has good representation from traditional HPC disciplines – genomics,
> astronomy, nuclear physics – most of whom do not rely upon HPC-enabled R for
> their work.  Somehow, despite my relative naiveté (having just recently
> moved from manually forking processes and using my own “collector” scripts
> to actually using doSNOW), I have found myself a de-facto representative of
> the R-users on campus.  One of the charges of the committee is to enable
> users to make at least one “step increase” in their computing power.
> Perhaps, since I’m basically at step 0, I also represent a large latent
> class of R users who could benefit from high throughput/high performance,
> but don’t know where to begin.
>
>   
>
> So, I am wondering if any of you have suggestions for me in my committee
> role to make sure the needs of the HPC R community are well represented in a
> nascent advanced computing initiative.  What hardware issues do you see as
> most critical?  Software?  What kinds of consulting resources would you want
> to have available to facilitate your R-related work?  Any recommendations
> for introductory material on high performance computing in R that I should
> read to get up to speed?  Thanks in advance for any comments/advice.
>
>   
>
> Best regards,
>
> Dave
>
>   
>
> _______________________________________
>
> David J. Vanness, PhD, Associate Professor
>
> Department of Population Health Sciences
>
> UW School of Medicine and Public Health
>
> 610 Walnut Street #785
>
> Madison, WI 53726
>
> Office: 608/265-8600
>
>   <mailto:dvanness at wisc.edu> dvanness at wisc.edu
>
>
>
>
> 	[[alternative HTML version deleted]]
>


-- 
George Ostrouchov, Ph.D.
Scientific Data Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory
(865) 574-3137  http://www.csm.ornl.gov/~ost

and

Remote Data Analysis and Visualization Center
National Institute for Computational Sciences
University of Tennessee

(865) 574-3137  http://www.csm.ornl.gov/~ost



More information about the R-sig-hpc mailing list