[R-sig-hpc] Parallel Linear Model

Hao Yu hyu at stats.uwo.ca
Wed Aug 22 18:38:36 CEST 2012


Here is my test with 8 core.

y<-rnorm(1000)
x<-matrix(rnorm(1000*10000),ncol=10000)
dimx<-dim(x)

library(parallel)
cl <- makeCluster(8, methods=FALSE)
print(system.time(
pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
x[,i]))$coefficients[2,4]))
))
   user  system elapsed
  25.46    0.02   25.62

#here is appply (not parallel)
system.time(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4]))
   user  system elapsed
  24.54    0.00   24.65

clusterExport(cl,"y")
system.time(pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4]))
   user  system elapsed
   0.72    0.47    6.73

stopCluster(cl)

Hao


Patrik Waldmann wrote:
> That seems to be a good idea (for 8 cores):
>
> y<-rnorm(1000)
> x<-matrix(rnorm(1000*10000),ncol=10000)
> dimx<-dim(x)
>
> library(doParallel)
> library(foreach)
> cl <- makeCluster(8, methods=FALSE)
> registerDoParallel(cl)
> print(system.time(
> pval <- foreach (i =1:dimx[2], .combine=c) %dopar% {
> mod <- lm(y ~ x[,i])
> summary(mod)$coefficients[2,4]
> }
> ))
>
>    user  system elapsed
>   12.28    2.75  231.93
>
> stopCluster(cl)
>
> library(parallel)
> cl <- makeCluster(8, methods=FALSE)
> print(system.time(
> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
> x[,i]))$coefficients[2,4]))
> ))
>
>    user  system elapsed
>   21.80    1.33   25.78
>
> stopCluster(cl)
>
>
> Patrik
>
>
>>>> Simon Urbanek <simon.urbanek at r-project.org> 22/08/2012 17:20 >>>
>
> On Aug 22, 2012, at 10:47 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at>
> wrote:
>
>> I did not manage to implement this example in foreach, could anyone
>> point me to a similar example?
>>
>
> I would't even both with foreach for something as simple - you can write
> it trivially as
>
> library(parallel)
>
> pval <- unlist(mclapply(1:n, function(i) summary(lm(y ~
> x[,i]))$coefficients[2,4]))
>
> Cheers,
> Simon
>
>
>> Patrik
>>
>>>>> Jay Emerson <jayemerson at gmail.com> 22/08/2012 14:05 >>>
>>
>> Patrik,
>>
>> Your question (at least from you example) is really about general
>> parallel computing. Nothing you want to do with your linear model from
>> your short example requires some special type of parallelism. I
>> recommend package 'foreach' with the parallel backends, or else the
>> package 'parallel' that comes with the newer versions of R. You could
>> also have a look at Dirk's HPC page:
>>
>> http://cran.r-project.org/web/views/HighPerformanceComputing.html
>>
>> Jay
>>
>> --
>> John W. Emerson (Jay)
>> Associate Professor of Statistics, Adjunct, and Acting Director of
>> Graduate Studies
>> Department of Statistics
>> Yale University
>> http://www.stat.yale.edu/~jay ( http://www.stat.yale.edu/%7Ejay )
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>


-- 
Department of Statistics & Actuarial Sciences
Office Phone#:(519)-661-3622
Fax Phone#:(519)-661-3813
The University of Western Ontario
London, Ontario N6A 5B7
http://www.stats.uwo.ca/yu



More information about the R-sig-hpc mailing list