[Rd] Distributed computing

gte810u at mail.gatech.edu gte810u at mail.gatech.edu
Tue Mar 23 22:51:16 CET 2004


> My inclination would be to, whenever possible, replace the core scalar
> libraries with compatible parallel versions (lapack -> scalapack),
> rather than make it an add-on package. If the R client code is general
> enough, and the make file can automatically find the parallel version,
> then its a simple matter of compiling with the parallel libs. (Don't
> know if this is possible at run-time.) No rewriting (high level) R code
> at all. I tried to contact the plapack folks here at UT about
> integrating with R, but it appears the project is no longer active.

Unfortunately, there is a major complication to this approach:  the distribution
of data.  ScaLAPACK (and PLAPACK) requires the data to be distributed in a
special way before calculation functions can be called.  Given a generic R
matrix, we have to distribute the data before we can call ScaLAPACK functions on
it.  We then have to collect the answer before we can return it to R.  Because
of this serious overhead, replacing all LAPACK calls with ScaLAPACK calls would
not be recommended.  Future versions of our package [1] may include some type of
automatic benchmarking to decide when problems are large enough to be worth
sending to ScaLAPACK.


David Bauer

[1] http://www.aspect-sdm.org/Parallel-R/



More information about the R-devel mailing list