[R-sig-hpc] Parallel File System support in R (e.g. GPFS)

George Ostrouchov ostrouchovg at ornl.gov
Fri Feb 17 19:46:30 CET 2012


Hi Jonathan,

We are developing some parallel file system readers and writers for R. 
They are intended to be used in a Single Program Multiple Data (SPMD) 
programming mode with Rmpi. Each processor reads its own chunk of data 
and is intended to hand it off to another SPMD R code to do the 
analysis. We are close to having a parallel version of ncdf, a NetCDF 
collective read/write package.

George

On 2/17/12 10:20 AM, Jonathan Greenberg wrote:
> R-sig-hpc'ers:
>
> I've started running R on a large cluster at my university, which uses the
> IBM GPFS parallel file system.  I'm wondering if there is any support
> within R for parallel writes to a single file or if there are any
> suggestions on to the implement, say, writing to a large binary file
> representing an image.  The parallelization I'm thinking of is:
>
> given an image of x by y columns and rows represented by a flat binary
> file, process chunks of this image on different cpus/nodes, then write the
> results to a single file.  The alternative is to write each chunk out
> separately then "mosaic" them back together, but this would involve
> reading/writing the data twice, and this process is going to be an I/O
> intensive one.  Thoughts?
>
> --j
>

-- 
George Ostrouchov, Ph.D.
Scientific Data Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory

and

Remote Data Analysis and Visualization Center
National Institute for Computational Sciences
The University of Tennessee

http://www.csm.ornl.gov/~ost



More information about the R-sig-hpc mailing list