[R-sig-hpc] Distributed computing
edd at debian.org
Sun Oct 16 02:44:09 CEST 2011
On 15 October 2011 at 23:00, Akshay Jain wrote:
| Hi everyone
| I have about 5-6 old laptops which are lying waste. I want to do some large
| data analysis using neural network algorithms.what is the best way to
| connect them to a grid in order to pool their CPU resources?
| There is a package "gridR" or RHIPE? Which is the best package for my
| needs? What is the one which requires minimum technical knowledge as I am
| not from an IT/computer science background, so not familiar with java etc.
We wrote a survey paper on the 'state of the art in parallel computing with
R' (see http://www.jstatsoft.org/v31/i01). We found Rmpi and snow to be
dominant in most use cases -- and I still find their setup easier than Hadoop
but others may differ.
You can get these laptops to use in a quickly built cluster simply by drpping
Ubuntu or Debian onto them, but it helps if you know some Unix/Linux tricks
and know eg how to propaget ssh keys.
Distributed computing with minimal IT knowledge is unfortunately a little bit
of a contradiction in terms. Your easiest bet may be to donate the laptops
and buy a cheap four or six core box and rely on the multicore package---or
the parallel package in R 2.14.0 due out in two weeks.
"Outside of a dog, a book is a man's best friend. Inside of a dog, it is too
dark to read." -- Groucho Marx
More information about the R-sig-hpc