[R-sig-hpc] itertools 0.1-1 and doMPI 0.1-4

Stephen Weston stephen.b.weston at gmail.com
Thu Jan 14 22:46:44 CET 2010


I'd like to announce the new "itertools" package, and version 0.1-4 of
the "doMPI" package, which have been uploaded to CRAN under the GPL-2
license.  Both of these packages are intended to be used with the
"foreach" and "iterators" packages.

The "itertools" package provides a variety of functions used to create
iterators, and was inspired by the Python itertools module.  It let's
you create iterators for splitting, chunking, repeating, recycling,
zipping, and filtering your data.  It also includes the "ireadBin"
function, which creates an iterator that reads binary data from a
connection object.  ireadBin supports seeking before reading, which can
be used in a parallel computing context to allow the workers to read
different portions of a large input file, for example.  I am planning to
write an example that demonstrates that technique in the near future.

The new version of "doMPI" fixes a few problems in version 0.1-3, but
also includes the new backend-specific "initEnvirMaster" option.  This
allows you to specify an R function that is executed in the master
process at the beginning of the foreach loop.  It is intended to be used
in conjunction with the "initEnvir" option, which specifies an R
function that is executed by the cluster workers.  By using both
together, MPI collective communication functions can be executed to
initialize the cluster workers.  This is particularly useful for sending
large matrices to each of the workers, since it allows data to be
broadcast using the mpi.bcast function without calling the R "serialize"
function.  An example of this technique is in the new "rforest.R" file,
a parallel random forest benchmark which is included in the "benchmark"
directory of the doMPI distribution.

Both of these projects are being developed on R-Forge, and I welcome any
comments or suggestions on how to improve them.  I'd particularly like
suggestions for better programming examples.  Not being a statistician,
figuring out useful, realistic examples is a big problem for me.
Perhaps a suggestion from one of you could prevent me from writing a
third example script for computing a poor approximation of pi.

- Steve Weston



More information about the R-sig-hpc mailing list