[R-sig-hpc] distributed R on EC2, designing the software stack
edd at debian.org
Wed Apr 29 21:39:25 CEST 2009
On 29 April 2009 at 12:06, Stephen J. Barr wrote:
| 1) R 2.9.0 + OpenMPI + RMpi + Snowfall/sfCluster
| - will Amazon's network work with OpenMPI. Perhaps it would be
| better to use PVM or something that is more tolerant to non-optimal
If you can use standard snow rather snowfall/sfcluster, then (I believe) you
are done. As per some emails on the Open MPI list from last fall or summer,
you get Debian / Ubuntu instances where all this is just an 'apt-get install'
or two away given the set of packages I maintain for Debian. Plus you get
slurm to control it.
| 2) R 2.9.0 + "socket based communication" + Snowfall/sfCluster
| - is this scalable
Likewise, snow and sockets works as is on Debian / Ubuntu.
| 3) R 2.9.0 + twisted + NetWorkSpaces
| - not sure of Amazon's network supports broadcast mode, which is
| required by twisted
Should also works out-of-the box via the r-cran-nws and python-nwsserver
package I maintain.
| 4) Biocep-R
| - this looks like it has the functionality to do what I want, but a
| lot of other stuff as well.
Yep, but I haven't had a chance to look more closely.
| 5) RHIPE
| - Hadoop is well supported by EC2. Perhaps this is the way to go.
| Seems like a very new package :)
Yes, and there is more Hadoop stuff cooking on R-Forge.
| What are people's thoughts on what would be a good software stack with
| the constraint that it should be simple and run on EC2?
I use the computer hanging around the house. If you have a desktop and a
laptop, you are ready to go. Or if you have enough ram, you can try virtual
approaches as well. Last time I tried (for my HPC tutorials) the networking
was fully 'see-through' yet though I hear that VirtualBox improved there.
Let us know what you come up with.
Three out of two people have difficulties with fractions.
More information about the R-sig-hpc