[R-sig-hpc] deathstar operational
Whit Armstrong
armstrong.whit at gmail.com
Sat Oct 15 21:06:34 CEST 2011
Deathstar is a collection of init scrips and a few simple zmq binaries
that allow for distributed computing using EC2 directly from your own
workstation.
You can check out your own copy here: https://github.com/armstrtw/deathstar.
I've rolled it up into a public ami since I haven't had the time to
put out a deb or rpm yet. The ami is a clone of just released ubuntu
11.10 with a smattering of R packages installed as well as libzmq and
the startup scripts from deathstar.
Anyone with a valid EC2 account should be able to run this example
(just substitute in your own key name in the startCluster command)
assuming you are already set up to use ec2 commands from a shell.
This example uses only 2 of the 8 core boxes amazon provides (c1.xlarge).
You'll need the most recent versions of rzmq and AWS.tools installed
on your local machine to run this demo. You can obtain the packages
from github. I'm not sure the cran versions are up to date.
https://github.com/armstrtw/AWS.tools
https://github.com/armstrtw/rzmq
These packages are very new, so please email me if you encounter any
bugs (or post an issue on the respective github site).
I think these packages significantly lower the bar for anyone who
wants to use R and AWS for distributed computing. Another key
advantage is that you can fire any AMI you want, so please feel free
to clone my ami, and install your own custom packages (that's what I
do for my own sims).
Feedback welcome.
This is the sim I ran (again shamelessly stealing JD Long's estimatePi example).
whit at spartan:~$ cat zmq.aws.lapply.test.r
library(AWS.tools)
library(rzmq)
estimatePi <- function(seed) {
set.seed(seed)
numDraws <- 1e6
r <- .5
x <- runif(numDraws, min=-r, max=r)
y <- runif(numDraws, min=-r, max=r)
inCircle <- ifelse( (x^2 + y^2)^.5 < r , 1, 0)
sum(inCircle) / length(inCircle) * 4
}
cl <- startCluster(ami="ami-9d5f93f4",key="maher-ave",instance.count=2,instance.type="c1.xlarge")
print("starting sim.")
run.time <- system.time(ans <-
zmq.cluster.lapply(cluster=cl$instances[,"dnsName"],as.list(1:1e3),estimatePi))
print("sim completed.")
res <- terminateCluster(cl)
print(mean(unlist(ans)))
print(run.time)
print(attr(ans,"execution.report"))
pi.est <- mean(unlist(ans))
print("result:")
print(pi.est)
whit at spartan:~$
and output from my run:
> source("zmq.aws.lapply.test.r")
Loading required package: XML
....
[1] "starting sim."
checking tcp://ec2-184-72-90-249.compute-1.amazonaws.com:6001 status: ready
checking tcp://ec2-174-129-168-81.compute-1.amazonaws.com:6001 status: ready
[1] "sim completed."
[1] 3.141587
user system elapsed
0.104 0.032 122.631
[,1]
ip-10-36-251-13:668 63
ip-10-36-251-13:671 63
ip-10-36-251-13:675 63
ip-10-36-251-13:679 62
ip-10-36-251-13:683 63
ip-10-36-251-13:687 62
ip-10-36-251-13:691 62
ip-10-36-251-13:694 62
ip-10-84-179-195:705 63
ip-10-84-179-195:709 63
ip-10-84-179-195:713 62
ip-10-84-179-195:717 62
ip-10-84-179-195:721 62
ip-10-84-179-195:726 63
ip-10-84-179-195:729 62
ip-10-84-179-195:732 63
[1] "result:"
[1] 3.141587
> res
[,1] [,2] [,3] [,4]
[1,] "INSTANCE" "i-40d4d020" "running" "shutting-down"
[2,] "INSTANCE" "i-42d4d022" "running" "shutting-down"
>
-Whit
More information about the R-sig-hpc
mailing list