[R-sig-hpc] Parallel Linear Model

Simon Urbanek simon.urbanek at r-project.org
Wed Aug 22 18:43:56 CEST 2012


On Aug 22, 2012, at 12:38 PM, "Hao Yu" <hyu at stats.uwo.ca> wrote:

> Here is my test with 8 core.
> 
> y<-rnorm(1000)
> x<-matrix(rnorm(1000*10000),ncol=10000)
> dimx<-dim(x)
> 
> library(parallel)
> cl <- makeCluster(8, methods=FALSE)
> print(system.time(
> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
> x[,i]))$coefficients[2,4]))
> ))
>   user  system elapsed
>  25.46    0.02   25.62
> 

... just to clarify, are you on Windows? You can't use multicore on Windows because the OS does not support it ... there you have to use snow.

Cheers,
S



> #here is appply (not parallel)
> system.time(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4]))
>   user  system elapsed
>  24.54    0.00   24.65
> 
> clusterExport(cl,"y")
> system.time(pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4]))
>   user  system elapsed
>   0.72    0.47    6.73
> 
> stopCluster(cl)
> 
> Hao
> 
> 
> Patrik Waldmann wrote:
>> That seems to be a good idea (for 8 cores):
>> 
>> y<-rnorm(1000)
>> x<-matrix(rnorm(1000*10000),ncol=10000)
>> dimx<-dim(x)
>> 
>> library(doParallel)
>> library(foreach)
>> cl <- makeCluster(8, methods=FALSE)
>> registerDoParallel(cl)
>> print(system.time(
>> pval <- foreach (i =1:dimx[2], .combine=c) %dopar% {
>> mod <- lm(y ~ x[,i])
>> summary(mod)$coefficients[2,4]
>> }
>> ))
>> 
>>   user  system elapsed
>>  12.28    2.75  231.93
>> 
>> stopCluster(cl)
>> 
>> library(parallel)
>> cl <- makeCluster(8, methods=FALSE)
>> print(system.time(
>> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
>> x[,i]))$coefficients[2,4]))
>> ))
>> 
>>   user  system elapsed
>>  21.80    1.33   25.78
>> 
>> stopCluster(cl)
>> 
>> 
>> Patrik
>> 
>> 
>>>>> Simon Urbanek <simon.urbanek at r-project.org> 22/08/2012 17:20 >>>
>> 
>> On Aug 22, 2012, at 10:47 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at>
>> wrote:
>> 
>>> I did not manage to implement this example in foreach, could anyone
>>> point me to a similar example?
>>> 
>> 
>> I would't even both with foreach for something as simple - you can write
>> it trivially as
>> 
>> library(parallel)
>> 
>> pval <- unlist(mclapply(1:n, function(i) summary(lm(y ~
>> x[,i]))$coefficients[2,4]))
>> 
>> Cheers,
>> Simon
>> 
>> 
>>> Patrik
>>> 
>>>>>> Jay Emerson <jayemerson at gmail.com> 22/08/2012 14:05 >>>
>>> 
>>> Patrik,
>>> 
>>> Your question (at least from you example) is really about general
>>> parallel computing. Nothing you want to do with your linear model from
>>> your short example requires some special type of parallelism. I
>>> recommend package 'foreach' with the parallel backends, or else the
>>> package 'parallel' that comes with the newer versions of R. You could
>>> also have a look at Dirk's HPC page:
>>> 
>>> http://cran.r-project.org/web/views/HighPerformanceComputing.html
>>> 
>>> Jay
>>> 
>>> --
>>> John W. Emerson (Jay)
>>> Associate Professor of Statistics, Adjunct, and Acting Director of
>>> Graduate Studies
>>> Department of Statistics
>>> Yale University
>>> http://www.stat.yale.edu/~jay ( http://www.stat.yale.edu/%7Ejay )
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>> 
>>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> 
> 
> 
> -- 
> Department of Statistics & Actuarial Sciences
> Office Phone#:(519)-661-3622
> Fax Phone#:(519)-661-3813
> The University of Western Ontario
> London, Ontario N6A 5B7
> http://www.stats.uwo.ca/yu
> 
> 



More information about the R-sig-hpc mailing list