[R-sig-hpc] Parallel Linear Model

Simon Urbanek simon.urbanek at r-project.org
Wed Aug 22 18:37:13 CEST 2012


On Aug 22, 2012, at 12:04 PM, "Patrik Waldmann" <patrik.waldmann at boku.ac.at> wrote:

> That seems to be a good idea (for 8 cores):
>  
> y<-rnorm(1000)
> x<-matrix(rnorm(1000*10000),ncol=10000)
> dimx<-dim(x)
>  
> library(doParallel)
> library(foreach)
> cl <- makeCluster(8, methods=FALSE)
> registerDoParallel(cl)
> print(system.time(
> pval <- foreach (i =1:dimx[2], .combine=c) %dopar% {
> mod <- lm(y ~ x[,i])
> summary(mod)$coefficients[2,4]
> }
> ))
> 
>    user  system elapsed 
>   12.28    2.75  231.93
>  
> stopCluster(cl)
>  
> library(parallel)
> cl <- makeCluster(8, methods=FALSE)
> print(system.time(
> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~ x[,i]))$coefficients[2,4]))
> ))
>  
>    user  system elapsed 
>   21.80    1.33   25.78
>  
> stopCluster(cl)
>  

You don't need makeCluster (it is not a cluster call) but you may want to raise the number of cores (default is only 2) so 

> options(mc.cores=8)
> system.time(
+ pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~ x[,i]))$coefficients[2,4]))
+ )
   user  system elapsed 
 27.853   1.090   4.254 

Cheers,
Simon


>  
> Patrik
>  
> 
> >>> Simon Urbanek <simon.urbanek at r-project.org> 22/08/2012 17:20 >>>
> 
> On Aug 22, 2012, at 10:47 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at> wrote:
> 
> > I did not manage to implement this example in foreach, could anyone point me to a similar example?
> > 
> 
> I would't even both with foreach for something as simple - you can write it trivially as
> 
> library(parallel)
> 
> pval <- unlist(mclapply(1:n, function(i) summary(lm(y ~ x[,i]))$coefficients[2,4]))
> 
> Cheers,
> Simon
> 
> 
> > Patrik
> > 
> >>>> Jay Emerson <jayemerson at gmail.com> 22/08/2012 14:05 >>>
> > 
> > Patrik,
> > 
> > Your question (at least from you example) is really about general parallel computing. Nothing you want to do with your linear model from your short example requires some special type of parallelism. I recommend package 'foreach' with the parallel backends, or else the package 'parallel' that comes with the newer versions of R. You could also have a look at Dirk's HPC page:
> > 
> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
> > 
> > Jay
> > 
> > -- 
> > John W. Emerson (Jay)
> > Associate Professor of Statistics, Adjunct, and Acting Director of Graduate Studies
> > Department of Statistics
> > Yale University
> > http://www.stat.yale.edu/~jay ( http://www.stat.yale.edu/%7Ejay )
> > 
> > [[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > R-sig-hpc mailing list
> > R-sig-hpc at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> > 
> > 
> 



More information about the R-sig-hpc mailing list