[R-sig-hpc] Parallel loop

Simon Urbanek simon.urbanek at r-project.org
Wed Mar 7 03:27:48 CET 2012


Scott,

On Mar 6, 2012, at 4:19 PM, Scott Raynaud wrote:

> I'm trying to convert a loop in a simulation program to a parallel
> process using multicore.  It looks like this:
> 
> simus=1000 
> set.seed(12345)
> for(iter in 1:simus){
>  
> #bunch of code setting up random draws from various 
> #distributions for X matrix 
>  
> #Computaion of Y
>  
> (fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
>  
> estbeta<-fixef(fitmodel)
>  sdebeta<-sqrt(diag(vcov(fitmodel)))
>   for(l in 1:betasize)
>   {  
>    cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
>     if(beta[l]*cibeta>0)              powaprox[[l]]<-powaprox[[l]]+1
>       sdepower[l,iter]<-as.numeric(sdebeta[l])
>   }  
> ##------------------------------------------------------------------------##
>                           } ##  iteration end here
>  
>  
> The variable fitmodel is defined elsewhere and consists of
> fixed and random parts.  Betasize is the length of the vector of
> fixed effects.  Looks like near the bottom there's a 
> counter that increments if the slope estimate doesn't trap 0.  
> The next line records the standard deviation.  It's these two items
> that I need to recover.
>  
> So I'm thinking that I'm going to recode this way to run on
> a machine that has 4-Xeon CPUs:
>  
> library(multicore)
> options(cores=4)
> simus=1000
> sim.base.fun<-function(iter){  #replaces for(iter in 1:simus){
> 
> #bunch of code setting up random draws from various 
> #distributions for X matrix 
>  
> #Computaion of Y
>  
> (fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
>  
> estbeta<-fixef(fitmodel)
>  sdebeta<-sqrt(diag(vcov(fitmodel)))
>   for(l in 1:betasize)
>   {  
>    cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
>     if(beta[l]*cibeta>0)     
>     return(powaprox[[1]]<-powaprox[[1]]+1 #replaces powaprox[[l]]<-powaprox[[l]]+1
>     retunr(sedpower[1,iter]<-as.numeric(sdebeta[1]) #replaces sdepower[l,iter]<-as.numeric(sdebeta[l])
>      }
>                                       }#end of sim.base.fun
> sim.fun<_lapply(iter in 1:simus),sim.base.fun)
>  

No, the above won't work (all typos aside) since you are still using assignments. Inherently the above is not parallelizable as-is because of
powaprox[[l]]<-powaprox[[l]]+1
but that is trivially removed. Also the inner for loop is entirely superfluous.

Since your code was incomplete this is just a suggestion as we can't test anything (I'm not even checking if what you're doing makes any sense), but it should give you an idea of what to do:

sim.base <- function(iter) { 
  ## your unstated code goes here ...
  fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2)
  estbeta <- fixef(fitmodel)
  sdebeta <- sqrt(diag(vcov(fitmodel)))
  list( powf = estbeta-sgnbeta*z1score*sdebeta*beta > 0,
        sdepower = sdebeta)
  )
}

res <- lapply(seq.int(simus), sim.base)
powf <- sapply(res, function(x) x$powf)
sdepower <- sapply(res, function(x) x$sdepower)
powapprox <- apply(powf, 1, sum)


To parallelize, replace lapply above with mclapply. 



>  Do I need a collect statement? 

No, if you use mclapply it's all done for you.


> How do I kill the worker processes? 

No, they go away once they're done computing.


> One thing I'm not accounting for is the random
> number generating process.  I've seen a couple of ways of
> doing this, one using mclapply and another using rlecuyer,
> but I'm not sure how to make them work.   Suggestions?
> Is there a way to replicate the same random numbers used 
> in the conventional loop in the parallel loop?

There is but you may need read up on that - you have to use a generator that produces a streaming sequence such that you can skip forward (also see recent discussion here and R-devel). For the above you may get away without it by simply setting different seeds in each iteration.


>  My machine has 4-Intel® Xeon® Processor X5650 CPUs.
> When I type:
>  
> library(multicore)
> multicore:::detectCores()
>  
> it returns 4.  However, the Intel cutsheet says each of these proceesors
> has 6 cores.  How can I acces them? 

You can set any number of cores you want using mc.cores=... argument - the detected cores are just a default if you don't specify anything.

What OS/distro is this? It is very unusual to see the number of cores reported wrongly by the system ...

Cheers,
Simon



> I know there's a point of diminishing
> returns but I can't figure out what it is unless I test it.
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> 



More information about the R-sig-hpc mailing list