[R-sig-hpc] distributed R on EC2, designing the software stack

Dirk Eddelbuettel edd at debian.org
Wed Apr 29 21:39:25 CEST 2009


On 29 April 2009 at 12:06, Stephen J. Barr wrote:
| 1) R 2.9.0 + OpenMPI + RMpi + Snowfall/sfCluster
|    - will Amazon's network work with OpenMPI. Perhaps it would be
| better to use PVM or something that is more tolerant to non-optimal
| network

If you can use standard snow rather snowfall/sfcluster, then (I believe) you
are done.  As per some emails on the Open MPI list from last fall or summer,
you get Debian / Ubuntu instances where all this is just an 'apt-get install'
or two away given the set of packages I maintain for Debian.  Plus you get
slurm to control it.

| 2)  R 2.9.0 + "socket based communication" + Snowfall/sfCluster
|   - is this scalable

Likewise, snow and sockets works as is on Debian / Ubuntu.

| 3)  R 2.9.0 + twisted + NetWorkSpaces
|    - not sure of Amazon's network supports broadcast mode, which is
| required by twisted

Should also works out-of-the box via the r-cran-nws and python-nwsserver
package I maintain.

| 4) Biocep-R
|    - this looks like it has the functionality to do what I want, but a
| lot of other stuff as well.

Yep, but I haven't had a chance to look more closely.

| 5) RHIPE
|    - Hadoop is well supported by EC2. Perhaps this is the way to go.
| Seems like a very new package :)

Yes, and there is more Hadoop stuff cooking on R-Forge.

| What are people's thoughts on what would be a good software stack with
| the constraint that it should be simple and run on EC2?

I use the computer hanging around the house.  If you have a desktop and a
laptop, you are ready to go.  Or if you have enough ram, you can try virtual
approaches as well.  Last time I tried (for my HPC tutorials) the networking
was fully 'see-through' yet though I hear that VirtualBox improved there.

Let us know what you come up with.

Dirk

-- 
Three out of two people have difficulties with fractions.



More information about the R-sig-hpc mailing list