[R-sig-hpc] Parallel loop

Scott Raynaud scott.raynaud at yahoo.com
Thu Mar 15 17:43:28 CET 2012


So what I meant by the latest version was that I think that I want to do would be 
easier in package parallel, but I'm a noob at this so I don't know for sure.  Anyway,
I need 2.14.1 to get to that package.

My IS people insist that the latest version of R avaialble via apt-get is 
2.13.1.  Anything later they claim will have to be compiled.  True?

----- Original Message -----
From: Simon Urbanek <simon.urbanek at r-project.org>
To: Scott Raynaud <scott.raynaud at yahoo.com>
Cc: "r-sig-hpc at r-project.org" <r-sig-hpc at r-project.org>
Sent: Thursday, March 8, 2012 8:30 PM
Subject: Re: [R-sig-hpc] Parallel loop

Scott,

On Mar 8, 2012, at 11:37 AM, Scott Raynaud wrote:

> Thanks for the feedback.  Part of my problem is that I need the most recent copy of R. 

I don't quite understand how this is related to your question ... The version of R plays no role here ...


> My IS team told me they can only get an earlier copy using apt-get, but I think there must be a way so I've tasked them with figuring it out.  
> 
> My OS is Kubuntu. I just thought it odd that it could only detect the  number of CPUs rather than the number of cores.  I'm completely new to parallel processing but it seems that something is not right in the core detection.
> 

The core detection in multicore on Linux simply looks at /proc/cpuinfo so if it's not right, then your OS is reporting something odd. Note that the detection is just a fall-back if you don't specify anything, so it's really up to you how many parallel processes you want to use.


> One quetion I still have regrds child processes. I understand they finish on their own, but what if I need to kill those processing because of an obvious problem.  How can I do that?
> 

If you interrupt the master process it automatically kills all child processes and cleans up - at least for all high-level functions like mclapply. If you use low-level functions, you can always use kill(children()); collect() 

Cheers,
Simon



> ----- Original Message -----
> From: Simon Urbanek <simon.urbanek at r-project.org>
> To: Scott Raynaud <scott.raynaud at yahoo.com>
> Cc: "r-sig-hpc at r-project.org" <r-sig-hpc at r-project.org>
> Sent: Tuesday, March 6, 2012 8:27 PM
> Subject: Re: [R-sig-hpc] Parallel loop
> 
> Scott,
> 
> On Mar 6, 2012, at 4:19 PM, Scott Raynaud wrote:
> 
>> I'm trying to convert a loop in a simulation program to a parallel
>> process using multicore.  It looks like this:
>> 
>> simus=1000 
>> set.seed(12345)
>> for(iter in 1:simus){
>>  
>> #bunch of code setting up random draws from various 
>> #distributions for X matrix 
>>  
>> #Computaion of Y
>>  
>> (fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
>>  
>> estbeta<-fixef(fitmodel)
>>  sdebeta<-sqrt(diag(vcov(fitmodel)))
>>  for(l in 1:betasize)
>>  {  
>>    cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
>>    if(beta[l]*cibeta>0)              powaprox[[l]]<-powaprox[[l]]+1
>>      sdepower[l,iter]<-as.numeric(sdebeta[l])
>>  }  
>> ##------------------------------------------------------------------------##
>>                          } ##  iteration end here
>>  
>>  
>> The variable fitmodel is defined elsewhere and consists of
>> fixed and random parts.  Betasize is the length of the vector of
>> fixed effects.  Looks like near the bottom there's a 
>> counter that increments if the slope estimate doesn't trap 0.  
>> The next line records the standard deviation.  It's these two items
>> that I need to recover.
>>  
>> So I'm thinking that I'm going to recode this way to run on
>> a machine that has 4-Xeon CPUs:
>>  
>> library(multicore)
>> options(cores=4)
>> simus=1000
>> sim.base.fun<-function(iter){  #replaces for(iter in 1:simus){
>> 
>> #bunch of code setting up random draws from various 
>> #distributions for X matrix 
>>  
>> #Computaion of Y
>>  
>> (fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2))
>>  
>> estbeta<-fixef(fitmodel)
>>  sdebeta<-sqrt(diag(vcov(fitmodel)))
>>  for(l in 1:betasize)
>>  {  
>>    cibeta<-estbeta[l]-sgnbeta[l]*z1score*sdebeta[l]
>>    if(beta[l]*cibeta>0)    
>>    return(powaprox[[1]]<-powaprox[[1]]+1 #replaces powaprox[[l]]<-powaprox[[l]]+1
>>    retunr(sedpower[1,iter]<-as.numeric(sdebeta[1]) #replaces sdepower[l,iter]<-as.numeric(sdebeta[l])
>>      }
>>                                      }#end of sim.base.fun
>> sim.fun<_lapply(iter in 1:simus),sim.base.fun)
>>  
> 
> No, the above won't work (all typos aside) since you are still using assignments. Inherently the above is not parallelizable as-is because of
> powaprox[[l]]<-powaprox[[l]]+1
> but that is trivially removed. Also the inner for loop is entirely superfluous.
> 
> Since your code was incomplete this is just a suggestion as we can't test anything (I'm not even checking if what you're doing makes any sense), but it should give you an idea of what to do:
> 
> sim.base <- function(iter) { 
>  ## your unstated code goes here ...
>  fitmodel <- lmer(modelformula,data,family=binomial(link=logit),nAGQ=2)
>  estbeta <- fixef(fitmodel)
>  sdebeta <- sqrt(diag(vcov(fitmodel)))
>  list( powf = estbeta-sgnbeta*z1score*sdebeta*beta > 0,
>        sdepower = sdebeta)
>  )
> }
> 
> res <- lapply(seq.int(simus), sim.base)
> powf <- sapply(res, function(x) x$powf)
> sdepower <- sapply(res, function(x) x$sdepower)
> powapprox <- apply(powf, 1, sum)
> 
> 
> To parallelize, replace lapply above with mclapply. 
> 
> 
> 
>>  Do I need a collect statement? 
> 
> No, if you use mclapply it's all done for you.
> 
> 
>> How do I kill the worker processes? 
> 
> No, they go away once they're done computing.
> 
> 
>> One thing I'm not accounting for is the random
>> number generating process.  I've seen a couple of ways of
>> doing this, one using mclapply and another using rlecuyer,
>> but I'm not sure how to make them work.  Suggestions?
>> Is there a way to replicate the same random numbers used 
>> in the conventional loop in the parallel loop?
> 
> There is but you may need read up on that - you have to use a generator that produces a streaming sequence such that you can skip forward (also see recent discussion here and R-devel). For the above you may get away without it by simply setting different seeds in each iteration.
> 
> 
>>  My machine has 4-Intel® Xeon® Processor X5650 CPUs.
>> When I type:
>>  
>> library(multicore)
>> multicore:::detectCores()
>>  
>> it returns 4.  However, the Intel cutsheet says each of these proceesors
>> has 6 cores.  How can I acces them? 
> 
> You can set any number of cores you want using mc.cores=... argument - the detected cores are just a default if you don't specify anything.
> 
> What OS/distro is this? It is very unusual to see the number of cores reported wrongly by the system ...
> 
> Cheers,
> Simon
> 
> 
> 
>> I know there's a point of diminishing
>> returns but I can't figure out what it is unless I test it.
>> 
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> 
>> 
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
>



More information about the R-sig-hpc mailing list