[R-sig-hpc] Distributed computing

Dirk Eddelbuettel edd at debian.org
Sun Oct 16 02:44:09 CEST 2011


On 15 October 2011 at 23:00, Akshay Jain wrote:
| Hi everyone
| 
| I have about 5-6 old laptops which are lying waste. I want to do some large
| data analysis using neural network algorithms.what is the best way to
| connect them to a grid in order to pool their CPU resources?
| 
| There is a package "gridR"  or RHIPE? Which is the best package for my
| needs? What is the one which requires minimum technical knowledge as I am
| not from an IT/computer science background, so not familiar with java etc.

We wrote a survey paper on the 'state of the art in parallel computing with
R' (see http://www.jstatsoft.org/v31/i01). We found Rmpi and snow to be
dominant in most use cases -- and I still find their setup easier than Hadoop
but others may differ.

You can get these laptops to use in a quickly built cluster simply by drpping
Ubuntu or Debian onto them, but it helps if you know some Unix/Linux tricks
and know eg how to propaget ssh keys.  

Distributed computing with minimal IT knowledge is unfortunately a little bit
of a contradiction in terms.  Your easiest bet may be to donate the laptops
and buy a cheap four or six core box and rely on the multicore package---or
the parallel package in R 2.14.0 due out in two weeks.

Dirk

-- 
"Outside of a dog, a book is a man's best friend. Inside of a dog, it is too
dark to read." -- Groucho Marx



More information about the R-sig-hpc mailing list