[R-sig-hpc] Parallel File System support in R (e.g. GPFS)
George Ostrouchov
ostrouchovg at ornl.gov
Fri Feb 17 19:46:30 CET 2012
Hi Jonathan,
We are developing some parallel file system readers and writers for R.
They are intended to be used in a Single Program Multiple Data (SPMD)
programming mode with Rmpi. Each processor reads its own chunk of data
and is intended to hand it off to another SPMD R code to do the
analysis. We are close to having a parallel version of ncdf, a NetCDF
collective read/write package.
George
On 2/17/12 10:20 AM, Jonathan Greenberg wrote:
> R-sig-hpc'ers:
>
> I've started running R on a large cluster at my university, which uses the
> IBM GPFS parallel file system. I'm wondering if there is any support
> within R for parallel writes to a single file or if there are any
> suggestions on to the implement, say, writing to a large binary file
> representing an image. The parallelization I'm thinking of is:
>
> given an image of x by y columns and rows represented by a flat binary
> file, process chunks of this image on different cpus/nodes, then write the
> results to a single file. The alternative is to write each chunk out
> separately then "mosaic" them back together, but this would involve
> reading/writing the data twice, and this process is going to be an I/O
> intensive one. Thoughts?
>
> --j
>
--
George Ostrouchov, Ph.D.
Scientific Data Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory
and
Remote Data Analysis and Visualization Center
National Institute for Computational Sciences
The University of Tennessee
http://www.csm.ornl.gov/~ost
More information about the R-sig-hpc
mailing list