[R-sig-hpc] Parallel Linear Model
Hao Yu
hyu at stats.uwo.ca
Wed Aug 22 18:38:36 CEST 2012
Here is my test with 8 core.
y<-rnorm(1000)
x<-matrix(rnorm(1000*10000),ncol=10000)
dimx<-dim(x)
library(parallel)
cl <- makeCluster(8, methods=FALSE)
print(system.time(
pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
x[,i]))$coefficients[2,4]))
))
user system elapsed
25.46 0.02 25.62
#here is appply (not parallel)
system.time(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4]))
user system elapsed
24.54 0.00 24.65
clusterExport(cl,"y")
system.time(pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4]))
user system elapsed
0.72 0.47 6.73
stopCluster(cl)
Hao
Patrik Waldmann wrote:
> That seems to be a good idea (for 8 cores):
>
> y<-rnorm(1000)
> x<-matrix(rnorm(1000*10000),ncol=10000)
> dimx<-dim(x)
>
> library(doParallel)
> library(foreach)
> cl <- makeCluster(8, methods=FALSE)
> registerDoParallel(cl)
> print(system.time(
> pval <- foreach (i =1:dimx[2], .combine=c) %dopar% {
> mod <- lm(y ~ x[,i])
> summary(mod)$coefficients[2,4]
> }
> ))
>
> user system elapsed
> 12.28 2.75 231.93
>
> stopCluster(cl)
>
> library(parallel)
> cl <- makeCluster(8, methods=FALSE)
> print(system.time(
> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
> x[,i]))$coefficients[2,4]))
> ))
>
> user system elapsed
> 21.80 1.33 25.78
>
> stopCluster(cl)
>
>
> Patrik
>
>
>>>> Simon Urbanek <simon.urbanek at r-project.org> 22/08/2012 17:20 >>>
>
> On Aug 22, 2012, at 10:47 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at>
> wrote:
>
>> I did not manage to implement this example in foreach, could anyone
>> point me to a similar example?
>>
>
> I would't even both with foreach for something as simple - you can write
> it trivially as
>
> library(parallel)
>
> pval <- unlist(mclapply(1:n, function(i) summary(lm(y ~
> x[,i]))$coefficients[2,4]))
>
> Cheers,
> Simon
>
>
>> Patrik
>>
>>>>> Jay Emerson <jayemerson at gmail.com> 22/08/2012 14:05 >>>
>>
>> Patrik,
>>
>> Your question (at least from you example) is really about general
>> parallel computing. Nothing you want to do with your linear model from
>> your short example requires some special type of parallelism. I
>> recommend package 'foreach' with the parallel backends, or else the
>> package 'parallel' that comes with the newer versions of R. You could
>> also have a look at Dirk's HPC page:
>>
>> http://cran.r-project.org/web/views/HighPerformanceComputing.html
>>
>> Jay
>>
>> --
>> John W. Emerson (Jay)
>> Associate Professor of Statistics, Adjunct, and Acting Director of
>> Graduate Studies
>> Department of Statistics
>> Yale University
>> http://www.stat.yale.edu/~jay ( http://www.stat.yale.edu/%7Ejay )
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
--
Department of Statistics & Actuarial Sciences
Office Phone#:(519)-661-3622
Fax Phone#:(519)-661-3813
The University of Western Ontario
London, Ontario N6A 5B7
http://www.stats.uwo.ca/yu
More information about the R-sig-hpc
mailing list