[R-sig-hpc] Parallel linear model

Hao Yu hyu at stats.uwo.ca
Fri Aug 24 18:32:56 CEST 2012


Try
clusterExport(cl,c("y","fix"))

Hao

Patrik Waldmann wrote:
> I have some additional questions: I want to add an extra matrix that I
> keep fixed, but I cannot get the parallel version working, any
> suggestions?
>
> y<-rnorm(1000)
> x<-matrix(rnorm(1000*10000),ncol=10000)
> fix<-matrix(rnorm(1000*10),ncol=10)
> #Non-parallel version works fine
> pval<-apply(x,2, function(x)summary(lm(y~x+fix))$coeff[2,4])
>
> library(parallel)
> cores<-detectCores()
> cl <- makeCluster(cores, methods=FALSE)
> clusterExport(cl,"y","fix")
> pvalstruct<-parApply(cl, x,2, function(x)summary(lm(y~x+fix))$coeff[2,4])
>
>
> Patrik
>
>
>>>> "Patrik Waldmann" <patrik.waldmann at boku.ac.at> 23/08/2012 12:12 >>>
> Here's a comparison on Windows based on 8 cores (excluding foreach):
>> y<-rnorm(1000)
>> x<-matrix(rnorm(1000*10000),ncol=10000)
>> dimx<-dim(x)
>> library(rbenchmark)
>> benchmark(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4]),
>> replications=1)
>
> 1 pval <- apply(x, 2, function(x) summary(lm(y ~ x))$coeff[2, 4])
> test replications elapsed relative user.self sys.self user.child sys.child
> 1    1   25.16        1     20.46     2.17         NA        NA
>
>> library(parallel)
>> cores<-detectCores()
>> cl <- makeCluster(cores, methods=FALSE)
>> clusterExport(cl,"y")
>> benchmark(pval<-parApply(cl, x,2,
>> function(x)summary(lm(y~x))$coeff[2,4]), replications=1)
> 1 pval <- parApply(cl, x, 2, function(x) summary(lm(y ~ x))$coeff[2, 4])
>   test replications elapsed relative user.self sys.self user.child
> sys.child
> 1      1    5.52        1      0.74     0.28         NA        NA
>
>> stopCluster(cl)
>
> # More fair
>
>> benchmark({cores<-detectCores()
> + cl <- makeCluster(cores, methods=FALSE)
> + clusterExport(cl,"y")
> + pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4])},
> replications=1)
>   test replications elapsed relative user.self sys.self user.child
> sys.child
> 1    {            1    7.11        1      0.65     0.37         NA
> NA
> Warning messages:
> 1: closing unused connection 10 (<-patwa-PC:10187)
> 2: closing unused connection 9 (<-patwa-PC:10187)
> 3: closing unused connection 8 (<-patwa-PC:10187)
> 4: closing unused connection 7 (<-patwa-PC:10187)
> 5: closing unused connection 6 (<-patwa-PC:10187)
> 6: closing unused connection 5 (<-patwa-PC:10187)
> 7: closing unused connection 4 (<-patwa-PC:10187)
> 8: closing unused connection 3 (<-patwa-PC:10187)
>> stopCluster(cl)
>
> What does the warnings refer to?
>
> Patrik
>
>>>> Dirk Eddelbuettel <edd at debian.org> 23/08/2012 02:53 >>>
>
> The difference between user and elapsed is an old hat. Here is a great
> example (and IIRC first shown here by Simon) with no compute time:
>
>    R> system.time(mclapply(1:8, function(x) Sys.sleep(1)))   ## 2 cores by
> default
>       user  system elapsed
>      0.000   0.012   4.014
>    R> system.time(mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8))
>       user  system elapsed
>      0.012   0.020   1.039
>    R>
>
> so elapsed time is effectively the one second a Sys.sleep(1) takes, plus
> overhead, if we allow for all eight (hyperthreaded) cores here.  By Brian
> Ripley's choice a default of two is baked-in, so clueless users only get a
> small gain.  "user time" is roughly the actual system load _summed over
> all
> processes / threads_.
>
> With that, could I ask any of the participants in the thread to re-try
> with a
> proper benchmarking package such as rbenchmark or microbenchmark?  Either
> one
> beats to the socks of system.time:
>
>    R> library(rbenchmark)
>    R> benchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8,
> function(x) Sys.sleep(1), mc.cores=8), replications=1)
>                                                       test replications
> elapsed relative
> user.self sys.self
> user.child sys.child
>    1               mclapply(1:8, function(x) Sys.sleep(1))            1
> 4.013  3.89612     0.000    0.008      0.000     0.004
>    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8)            1
> 1.030  1.00000     0.004    0.008      0.004     0.000
>    R>
>
> and
>
>    R> library(microbenchmark)
>    R> microbenchmark( mclapply(1:8, function(x) Sys.sleep(1)),
> mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), times=1)
>    Unit: seconds
>                                                       expr     min      lq
>  median      uq
> max
>    1               mclapply(1:8, function(x) Sys.sleep(1)) 4.01377 4.01377
> 4.01377 4.01377 4.01377
>    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1.03457 1.03457
> 1.03457 1.03457 1.03457
>    R>
>
> (and you normally want to run either with 10 or 100 or ... replications /
> times).
>
> Dirk
>
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>


-- 
Department of Statistics & Actuarial Sciences
Office Phone#:(519)-661-3622
Fax Phone#:(519)-661-3813
The University of Western Ontario
London, Ontario N6A 5B7
http://www.stats.uwo.ca/yu



More information about the R-sig-hpc mailing list