[R-sig-hpc] Rmpi and ncdf4

clement clement.tisseuil at gmail.com
Wed Sep 15 09:49:22 CEST 2010

Dear Paul,

I do agree with your suggestion that I have successfully experimented. 
Namely, the NetCDF file is opened one time in the Master process. The 
subset of data required for the calculation is also extracted from the 
Master process before being sent to the available slave. When the 
calculation is done, results are sent to the Master process that writes 
the results to the NetCDF file. The "task pull" method (described for 
the Rmpi package at http://math.acadiau.ca/ACMMaC/Rmpi/task_pull.R) is 
particularly well adapted for this problem.

Thanks for your assistance.



On 9/14/2010 8:30 PM, Paul Johnson wrote:
> On Fri, Sep 10, 2010 at 11:12 AM, clement<clement.tisseuil at gmail.com>  wrote:
>> Dear members,
>>> I am using the ncdf4 package to work on General Circulation Model (GCM)
>>> data (NetCDF file format) and I would like to parallelize some calculations
>>> using Rmpi. Does anyone have an experience or advices in using Rmpi and
>>> ncdf4 packages?
> Dear Clement:
> Thanks for posting your code. It really helps me to learn when I can
> read through what other people try.
> I found myself wondering "how much" of the whole data set is used by
> each slave.  Supposing the slave needs only a smaller piece,   I think
> your problem would work more efficiently if you have the master load
> the data one time and have it send the separate pieces to the slaves
> for the work.  Well, that's what I would do because I've had very bad
> experience when lots of nodes try to access the same file on NFS.  (It
> causes something like a traffic jam as the processes fight over each
> other).
> Instead of doing system(rm ...), I'd suggest you clean files with the
> file.remove function (see ?files) .  That will work across platforms,
> so even people who use Windows might someday be able to run your code.
> pj

Clément Tisseuil

More information about the R-sig-hpc mailing list