[R-sig-hpc] deathstar operational

Whit Armstrong armstrong.whit at gmail.com
Sat Oct 15 21:06:34 CEST 2011


Deathstar is a collection of init scrips and a few simple zmq binaries
that allow for distributed computing using EC2 directly from your own
workstation.

You can check out your own copy here: https://github.com/armstrtw/deathstar.

I've rolled it up into a public ami since I haven't had the time to
put out a deb or rpm yet. The ami is a clone of just released ubuntu
11.10 with a smattering of R packages installed as well as libzmq and
the startup scripts from deathstar.

Anyone with a valid EC2 account should be able to run this example
(just substitute in your own key name in the startCluster command)
assuming you are already set up to use ec2 commands from a shell.

This example uses only 2 of the 8 core boxes amazon provides (c1.xlarge).

You'll need the most recent versions of rzmq and AWS.tools installed
on your local machine to run this demo.  You can obtain the packages
from github. I'm not sure the cran versions are up to date.

https://github.com/armstrtw/AWS.tools
https://github.com/armstrtw/rzmq

These packages are very new, so please email me if you encounter any
bugs (or post an issue on the respective github site).

I think these packages significantly lower the bar for anyone who
wants to use R and AWS for distributed computing.  Another key
advantage is that you can fire any AMI you want, so please feel free
to clone my ami, and install your own custom packages (that's what I
do for my own sims).

Feedback welcome.

This is the sim I ran (again shamelessly stealing JD Long's estimatePi example).

whit at spartan:~$ cat zmq.aws.lapply.test.r
library(AWS.tools)
library(rzmq)


estimatePi <- function(seed) {
    set.seed(seed)
    numDraws <- 1e6

    r <- .5
    x <- runif(numDraws, min=-r, max=r)
    y <- runif(numDraws, min=-r, max=r)
    inCircle <- ifelse( (x^2 + y^2)^.5 < r , 1, 0)

    sum(inCircle) / length(inCircle) * 4
}

cl <- startCluster(ami="ami-9d5f93f4",key="maher-ave",instance.count=2,instance.type="c1.xlarge")
print("starting sim.")
run.time <- system.time(ans <-
zmq.cluster.lapply(cluster=cl$instances[,"dnsName"],as.list(1:1e3),estimatePi))
print("sim completed.")
res <- terminateCluster(cl)


print(mean(unlist(ans)))
print(run.time)
print(attr(ans,"execution.report"))

pi.est <- mean(unlist(ans))
print("result:")
print(pi.est)
whit at spartan:~$


and output from my run:
> source("zmq.aws.lapply.test.r")
Loading required package: XML
....
[1] "starting sim."
checking tcp://ec2-184-72-90-249.compute-1.amazonaws.com:6001 status: ready
checking tcp://ec2-174-129-168-81.compute-1.amazonaws.com:6001 status: ready
[1] "sim completed."
[1] 3.141587
   user  system elapsed
  0.104   0.032 122.631
                     [,1]
ip-10-36-251-13:668    63
ip-10-36-251-13:671    63
ip-10-36-251-13:675    63
ip-10-36-251-13:679    62
ip-10-36-251-13:683    63
ip-10-36-251-13:687    62
ip-10-36-251-13:691    62
ip-10-36-251-13:694    62
ip-10-84-179-195:705   63
ip-10-84-179-195:709   63
ip-10-84-179-195:713   62
ip-10-84-179-195:717   62
ip-10-84-179-195:721   62
ip-10-84-179-195:726   63
ip-10-84-179-195:729   62
ip-10-84-179-195:732   63
[1] "result:"
[1] 3.141587
> res
     [,1]       [,2]         [,3]      [,4]
[1,] "INSTANCE" "i-40d4d020" "running" "shutting-down"
[2,] "INSTANCE" "i-42d4d022" "running" "shutting-down"
>

-Whit



More information about the R-sig-hpc mailing list