[R-sig-hpc] Advice on HPC/R needs for advanced computing oversight committee
George Ostrouchov
ostrouchovg at ornl.gov
Fri Jan 18 22:03:08 CET 2013
Dear David,
We have several R packages that target what is called "single program,
multiple data" SPMD style of batch parallel programming on distributed
machines (see r-pbd.org). SPMD is how most things are done on
supercomputers for the last 20 years. While our target machines are
large, the packages work on smaller machines down to multicore laptops.
There are examples in the packages already released and we are close to
releasing a package, pbdDEMO, which is a more thorough introduction.
Also, let me mention that NSF's XSEDE resources are available to any
researcher or educator at a US-based institution. The resources include
clusters, scalable-parallel systems, and shared-memory systems with
various CPU, memory, communications, and storage configurations. To get
started, see https://www.xsede.org/using-xsede. Let me add that we use
the Kraken and Nautilus resources to develop the pbdR set of packages,
so these resources will have the most R support.
George
On 1/11/13 12:52 PM, David J. Vanness wrote:
> Dear SIG members,
>
>
>
> I am a health economist at the University of Wisconsin Madison. In part of
> my work, I use R and jags to conduct MCMC analysis of clinical and economic
> data for medical decision-making. Over the years, I have slowly waded into
> high throughput techniques for conducting “pleasantly parallel” aspects of
> my work. I have dreams of using HPC to enable more efficient Bayesian
> analysis of large health care claims datasets – but at this point they are
> just dreams.
>
>
>
> Recently, I have been asked to join a new committee that is charged with
> oversight in developing a university-wide advanced computing shared resource
> (consisting of both hardware and professional consultation). The committee
> has good representation from traditional HPC disciplines – genomics,
> astronomy, nuclear physics – most of whom do not rely upon HPC-enabled R for
> their work. Somehow, despite my relative naiveté (having just recently
> moved from manually forking processes and using my own “collector” scripts
> to actually using doSNOW), I have found myself a de-facto representative of
> the R-users on campus. One of the charges of the committee is to enable
> users to make at least one “step increase” in their computing power.
> Perhaps, since I’m basically at step 0, I also represent a large latent
> class of R users who could benefit from high throughput/high performance,
> but don’t know where to begin.
>
>
>
> So, I am wondering if any of you have suggestions for me in my committee
> role to make sure the needs of the HPC R community are well represented in a
> nascent advanced computing initiative. What hardware issues do you see as
> most critical? Software? What kinds of consulting resources would you want
> to have available to facilitate your R-related work? Any recommendations
> for introductory material on high performance computing in R that I should
> read to get up to speed? Thanks in advance for any comments/advice.
>
>
>
> Best regards,
>
> Dave
>
>
>
> _______________________________________
>
> David J. Vanness, PhD, Associate Professor
>
> Department of Population Health Sciences
>
> UW School of Medicine and Public Health
>
> 610 Walnut Street #785
>
> Madison, WI 53726
>
> Office: 608/265-8600
>
> <mailto:dvanness at wisc.edu> dvanness at wisc.edu
>
>
>
>
> [[alternative HTML version deleted]]
>
--
George Ostrouchov, Ph.D.
Scientific Data Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory
(865) 574-3137 http://www.csm.ornl.gov/~ost
and
Remote Data Analysis and Visualization Center
National Institute for Computational Sciences
University of Tennessee
(865) 574-3137 http://www.csm.ornl.gov/~ost
More information about the R-sig-hpc
mailing list