[R-sig-hpc] Distributed computing

Sean Davis sdavis2 at mail.nih.gov
Sun Oct 16 14:30:49 CEST 2011


On Sat, Oct 15, 2011 at 8:44 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 15 October 2011 at 23:00, Akshay Jain wrote:
> | Hi everyone
> |
> | I have about 5-6 old laptops which are lying waste. I want to do some large
> | data analysis using neural network algorithms.what is the best way to
> | connect them to a grid in order to pool their CPU resources?
> |
> | There is a package "gridR"  or RHIPE? Which is the best package for my
> | needs? What is the one which requires minimum technical knowledge as I am
> | not from an IT/computer science background, so not familiar with java etc.
>
> We wrote a survey paper on the 'state of the art in parallel computing with
> R' (see http://www.jstatsoft.org/v31/i01). We found Rmpi and snow to be
> dominant in most use cases -- and I still find their setup easier than Hadoop
> but others may differ.
>
> You can get these laptops to use in a quickly built cluster simply by drpping
> Ubuntu or Debian onto them, but it helps if you know some Unix/Linux tricks
> and know eg how to propaget ssh keys.
>
> Distributed computing with minimal IT knowledge is unfortunately a little bit
> of a contradiction in terms.  Your easiest bet may be to donate the laptops
> and buy a cheap four or six core box and rely on the multicore package---or
> the parallel package in R 2.14.0 due out in two weeks.

And if buying hardware is not in the cards, there are a few AMIs
bundled with R out there that could be dropped onto Amazon to give you
a relatively large SMP machine that would serve the same purpose.

Sean



More information about the R-sig-hpc mailing list