[Rd] Distributed computing
Fei Chen
feic at stats.ox.ac.uk
Wed Mar 24 17:46:08 CET 2004
Thanks Brian for pointing this out...
Yes indeed my thesis involved distributed computing and R. It consisted of
two parts, a distributed scoping feature for limiting data movements, and
a parrallel computing interface for speeding up computations. The former
used CORBA and the latter PVM (plus embedded R-s and ScaLAPACK).
There are three documents available describing this in more detail
http://www.stats.ox.ac.uk/~feic/Rs/thesis.pdf
my thesis
http://www.stats.ox.ac.uk/~feic/Rs/shorter.pdf
a shorter summary
http://www.stats.ox.ac.uk/~feic/Rs/DSC2003.pdf
the DSC document Brian pointed out.
I haven't publicized this mainly because the distributed scoping piece
involved modifying internal R code, most notably the R_eval() function,
which is a bit non-portable... But if there's interest in how I did things
I can certainly clean up my code and make it available. The parallel
engine part uses standard R so it should be easier to set up.
Cheers,
fei
On Wed, 24 Mar 2004, Prof Brian Ripley wrote:
> Fei Chen implemented distribution of data and ScaLAPACK as part of his
> DPhil thesis, with a high-level R interface. Moving data around is often
> the major limiting factor on large-scale model fitting (he was
> experimenting with glm's).
>
> There are two brief papers at
>
> http://www.isi-2003.de/guest/3427.pdf?MItabObj=pcoabstract&MIcolObj=uploadpaper&MInamObj=id&MIvalObj=3427&MItypeObj=application/pdf
>
> adn in the DSC2003 proceedings (but the ci.tuwien server is currently not
> available, at least from here).
>
> Now Fei's process is complete, perhaps he will make the thesis available
> on line.
>
>
> On Tue, 23 Mar 2004 gte810u at mail.gatech.edu wrote:
>
> Quoting someone unamed! --
>
> > > My inclination would be to, whenever possible, replace the core scalar
> > > libraries with compatible parallel versions (lapack -> scalapack),
> > > rather than make it an add-on package. If the R client code is general
> > > enough, and the make file can automatically find the parallel version,
> > > then its a simple matter of compiling with the parallel libs. (Don't
> > > know if this is possible at run-time.) No rewriting (high level) R code
> > > at all. I tried to contact the plapack folks here at UT about
> > > integrating with R, but it appears the project is no longer active.
> >
> > Unfortunately, there is a major complication to this approach: the distribution
> > of data. ScaLAPACK (and PLAPACK) requires the data to be distributed in a
> > special way before calculation functions can be called. Given a generic R
> > matrix, we have to distribute the data before we can call ScaLAPACK functions on
> > it. We then have to collect the answer before we can return it to R. Because
> > of this serious overhead, replacing all LAPACK calls with ScaLAPACK calls would
> > not be recommended. Future versions of our package [1] may include some type of
> > automatic benchmarking to decide when problems are large enough to be worth
> > sending to ScaLAPACK.
> >
> >
> > David Bauer
> >
> > [1] http://www.aspect-sdm.org/Parallel-R/
> >
> > ______________________________________________
> > R-devel at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
More information about the R-devel
mailing list