[R-sig-hpc] resolved Rmpi with simulation study

Mon Nov 9 18:31:39 CET 2009

Hodgess, Erin wrote:
> Hi again.
> 
> I calculated the closest larger number of sims based on the mpi.comm.size and did a loop based on that.

A different strategy is to aim first for an lapply implementation of 
your overall program (for single threaded and to get the syntax correct) 
and then simply change to mpi.parLapply. Use the job.num argument to 
determine how to divide the list of tasks among nodes (if communication 
costs are important then leave as-is, which divides the tasks as evenly 
as possible amongst all nodes; the other extreme is 
job.num=length(<whatever lapply is iterating through>), which will 
allocate one task at a time, maximizing communication but providing 
fine-grained 'load balancing' appropriate if each simulation has a very 
unpredictable time for execution).

If you think carefully about your lapply, and arrange so that the 
function takes in all required arguments, then there is no need for 
broadcasting objects to slaves or worrying about minimizing 
communication costs, mpi.parLapply does all the work

For instance,

   myMatrix <- matrix(rnorm(1000000), 100) # 100 x 1000
   res <- mpi.parSapply(seq_len(nrow(myMatrix)), function(i, m) {
       ## do the real work here, we'll do something inappropriate
       mean(m[i,])
   }, myMatrix)
   all.equal(res, rowMeans(myMatrix))

forwards myMatrix efficiently.

Martin

> 
> I got about 4 times speedup with 26 nodes, 2 times speedup with 7 nodes.
> 
> The communication must be the slowdown factor here.
> 
> But for what it's worth, here are the functions, as well:
> 
> [hodgess at grid bin]$ cat linsim.R
> chow1 <- function() {
> 	x <- bigx[,fold]
> 	y <- bigy[,fold]
> 	nm <- length(x)
> 	n <- nm/m
> 	xq <- yq <- matrix(0,nrow=n,ncol=1)
> 	c1 <- (1/m)*cdgen(n,m)
> 	xq <- c1 %*% x
> 	yq <- c1 %*% y
> 	vv <- c1 %*% t(c1)
> 	vvi <- solve(vv)
> 	xvx <- t(xq) %*% vvi %*% xq
> 	xvxi <- solve(xvx)
> 	beta <- xvxi %*% t(xq) %*% vvi %*% yq
> 	res1 <- yq - xq %*% beta
> 	cv1 <- t(c1) %*% vvi
> 	newy <- x %*% beta + cv1%*%res1
> 	w <- sum((x-newy)^2)/360
> 	return(w)
> }
> 
> 
> 
> 
> cdgen <- function(n,m) {
> 	x <- rep(1,m)
> 	nm <- n*m
> 	cd <- matrix(0,nrow=n,ncol=nm)
> 	xg <- seq(from=1,by=m,length=n)
> 	xh <- seq(from=m,by=m,length=n)
> 	for(i in 1:n) {
> 		cd[i,xg[i]:xh[i]] <- x
> 	}
> 	return(cd)
> }
> 
> 
> linsim <- function(n1=10,phi1=0.5) {
> library("Rmpi")
> 
> # Notice we just say "give us all the slaves you've got."
> mpi.spawn.Rslaves()
> 
> if (mpi.comm.size() < 2) {
>     print("More slave processes are required.")
>     mpi.quit()
>     }
> 
> .Last <- function(){
>     if (is.loaded("mpi_initialize")){
>         if (mpi.comm.size(1) > 0){
>             print("Please use mpi.close.Rslaves() to close slaves.")
>             mpi.close.Rslaves()
>         }
>         print("Please use mpi.quit() to quit R")
>         .Call("mpi_finalize")
>     }
> 
> }
> 
> 
> 	w <- NULL
> 	m <- 3
> #Fix up closest multiple from mpi.comm.size
> 	n2 <- ceiling(n1/(mpi.comm.size()))
> 	n1 <- n3 <- n2*(mpi.comm.size())
> 	xx <- min(mpi.comm.size(),n1)
> 	cat("xx",xx,n1,n2,"\n")
> 
> 	iz <- 0
> #run loop based on mpi.comm.size	
> 	while(n1 >=1) {
> 	bigx <- matrix(arima.sim(list(order=c(1,0,0),ar=phi1),360*xx),
> 		nrow=360,ncol=xx)
> 	bigy <- matrix(arima.sim(list(order=c(1,0,0),ar=phi1),360*xx),
> 		nrow=360,ncol=xx)
> 	mpi.bcast.Robj2slave(phi1)
> 	mpi.bcast.Robj2slave(m)
> 	mpi.bcast.Robj2slave(chow1)
> 	mpi.bcast.Robj2slave(cdgen)
> 	mpi.bcast.Robj2slave(bigx)
> 	mpi.bcast.Robj2slave(bigy)
> 	mpi.bcast.cmd(fold <- mpi.comm.rank())
> 	mpi.bcast.cmd(chow1)
> 	
> 	newone <- mpi.remote.exec(chow1())
> #	newone <- chow1(x=x1,y=y1,m=3)
> 	w <- c(w,unlist(newone))
> 	n1 <- n1 - xx
> 	xx <- min(n1,mpi.comm.size())
> 	}
> 	zzz <- list(total=n3,mean=mean(w,na.rm=TRUE),sd=sd(w,na.rm=TRUE)/sqrt(360))
> 	return(zzz)
> }
> 
> 
> Thanks,
> Erin
> 
> 
> Erin M. Hodgess, PhD
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: hodgesse at uhd.edu
> 
> 
> 
> -----Original Message-----
> From: Brian G. Peterson [mailto:brian at braverock.com]
> Sent: Mon 11/9/2009 3:56 AM
> To: Hodgess, Erin
> Cc: r-sig-hpc at r-project.org
> Subject: Re: [R-sig-hpc] resolved Rmpi with simulation study
>  
> Hodgess, Erin wrote:
>> I came up with a solution...not a great one, but it runs.
> 
> So, for closure, and the archives, what was the solution you came up with and 
> your diagnosis of the problem?
> 
> I have typically used 'rsprng' or 'random' to get random numbers that are 
> different on different nodes in a parallel operation.
> 
> Regards,
> 
>     - Brian
> 

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793