[R-sig-hpc] Parallel Linear Model
Simon Urbanek
simon.urbanek at r-project.org
Wed Aug 22 18:43:56 CEST 2012
On Aug 22, 2012, at 12:38 PM, "Hao Yu" <hyu at stats.uwo.ca> wrote:
> Here is my test with 8 core.
>
> y<-rnorm(1000)
> x<-matrix(rnorm(1000*10000),ncol=10000)
> dimx<-dim(x)
>
> library(parallel)
> cl <- makeCluster(8, methods=FALSE)
> print(system.time(
> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
> x[,i]))$coefficients[2,4]))
> ))
> user system elapsed
> 25.46 0.02 25.62
>
... just to clarify, are you on Windows? You can't use multicore on Windows because the OS does not support it ... there you have to use snow.
Cheers,
S
> #here is appply (not parallel)
> system.time(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4]))
> user system elapsed
> 24.54 0.00 24.65
>
> clusterExport(cl,"y")
> system.time(pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4]))
> user system elapsed
> 0.72 0.47 6.73
>
> stopCluster(cl)
>
> Hao
>
>
> Patrik Waldmann wrote:
>> That seems to be a good idea (for 8 cores):
>>
>> y<-rnorm(1000)
>> x<-matrix(rnorm(1000*10000),ncol=10000)
>> dimx<-dim(x)
>>
>> library(doParallel)
>> library(foreach)
>> cl <- makeCluster(8, methods=FALSE)
>> registerDoParallel(cl)
>> print(system.time(
>> pval <- foreach (i =1:dimx[2], .combine=c) %dopar% {
>> mod <- lm(y ~ x[,i])
>> summary(mod)$coefficients[2,4]
>> }
>> ))
>>
>> user system elapsed
>> 12.28 2.75 231.93
>>
>> stopCluster(cl)
>>
>> library(parallel)
>> cl <- makeCluster(8, methods=FALSE)
>> print(system.time(
>> pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
>> x[,i]))$coefficients[2,4]))
>> ))
>>
>> user system elapsed
>> 21.80 1.33 25.78
>>
>> stopCluster(cl)
>>
>>
>> Patrik
>>
>>
>>>>> Simon Urbanek <simon.urbanek at r-project.org> 22/08/2012 17:20 >>>
>>
>> On Aug 22, 2012, at 10:47 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at>
>> wrote:
>>
>>> I did not manage to implement this example in foreach, could anyone
>>> point me to a similar example?
>>>
>>
>> I would't even both with foreach for something as simple - you can write
>> it trivially as
>>
>> library(parallel)
>>
>> pval <- unlist(mclapply(1:n, function(i) summary(lm(y ~
>> x[,i]))$coefficients[2,4]))
>>
>> Cheers,
>> Simon
>>
>>
>>> Patrik
>>>
>>>>>> Jay Emerson <jayemerson at gmail.com> 22/08/2012 14:05 >>>
>>>
>>> Patrik,
>>>
>>> Your question (at least from you example) is really about general
>>> parallel computing. Nothing you want to do with your linear model from
>>> your short example requires some special type of parallelism. I
>>> recommend package 'foreach' with the parallel backends, or else the
>>> package 'parallel' that comes with the newer versions of R. You could
>>> also have a look at Dirk's HPC page:
>>>
>>> http://cran.r-project.org/web/views/HighPerformanceComputing.html
>>>
>>> Jay
>>>
>>> --
>>> John W. Emerson (Jay)
>>> Associate Professor of Statistics, Adjunct, and Acting Director of
>>> Graduate Studies
>>> Department of Statistics
>>> Yale University
>>> http://www.stat.yale.edu/~jay ( http://www.stat.yale.edu/%7Ejay )
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>
>>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>
>
> --
> Department of Statistics & Actuarial Sciences
> Office Phone#:(519)-661-3622
> Fax Phone#:(519)-661-3813
> The University of Western Ontario
> London, Ontario N6A 5B7
> http://www.stats.uwo.ca/yu
>
>
More information about the R-sig-hpc
mailing list