[R-sig-hpc] Parallel linear model

Dirk Eddelbuettel edd at debian.org
Thu Aug 23 15:16:33 CEST 2012


On 23 August 2012 at 12:12, Patrik Waldmann wrote:
| Here's a comparison on Windows based on 8 cores (excluding foreach):
| > y<-rnorm(1000)
| > x<-matrix(rnorm(1000*10000),ncol=10000)
| > dimx<-dim(x)
| > library(rbenchmark)
| > benchmark(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4]),
| replications=1)
|                                                             
| 1 pval <- apply(x, 2, function(x) summary(lm(y ~ x))$coeff[2, 4])            
| test replications elapsed relative user.self sys.self user.child sys.child
| 1    1   25.16        1     20.46     2.17         NA        NA
|  
| > library(parallel)
| > cores<-detectCores()
| > cl <- makeCluster(cores, methods=FALSE)
| > clusterExport(cl,"y")
| > benchmark(pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4]),
| replications=1)
| 1 pval <- parApply(cl, x, 2, function(x) summary(lm(y ~ x))$coeff[2, 4])
|   test replications elapsed relative user.self sys.self user.child sys.child
| 1      1    5.52        1      0.74     0.28         NA        NA
|  
| > stopCluster(cl)
|  
| # More fair
|  
| > benchmark({cores<-detectCores()
| + cl <- makeCluster(cores, methods=FALSE)
| + clusterExport(cl,"y")
| + pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4])},
| replications=1)
|   test replications elapsed relative user.self sys.self user.child sys.child
| 1    {            1    7.11        1      0.65     0.37         NA        NA
| Warning messages:
| 1: closing unused connection 10 (<-patwa-PC:10187)
| 2: closing unused connection 9 (<-patwa-PC:10187)
| 3: closing unused connection 8 (<-patwa-PC:10187)
| 4: closing unused connection 7 (<-patwa-PC:10187)
| 5: closing unused connection 6 (<-patwa-PC:10187)
| 6: closing unused connection 5 (<-patwa-PC:10187)
| 7: closing unused connection 4 (<-patwa-PC:10187)
| 8: closing unused connection 3 (<-patwa-PC:10187)
| > stopCluster(cl)
|  
| What does the warnings refer to?

I prefer doing HPC on Linux --- but on Windows, as I recall, this stems from
the socket connections needed to make things do.

Dirk
  
| Patrik
| 
| >>> Dirk Eddelbuettel <edd at debian.org> 23/08/2012 02:53 >>>
| 
| The difference between user and elapsed is an old hat. Here is a great
| example (and IIRC first shown here by Simon) with no compute time:
| 
|    R> system.time(mclapply(1:8, function(x) Sys.sleep(1)))   ## 2 cores by
| default
|       user  system elapsed
|      0.000   0.012   4.014
|    R> system.time(mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8))
|       user  system elapsed
|      0.012   0.020   1.039
|    R>
| 
| so elapsed time is effectively the one second a Sys.sleep(1) takes, plus
| overhead, if we allow for all eight (hyperthreaded) cores here.  By Brian
| Ripley's choice a default of two is baked-in, so clueless users only get a
| small gain.  "user time" is roughly the actual system load _summed over all
| processes / threads_.
| 
| With that, could I ask any of the participants in the thread to re-try with a
| proper benchmarking package such as rbenchmark or microbenchmark?  Either one
| beats to the socks of system.time:
| 
|    R> library(rbenchmark)
|    R> benchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8,
| function(x) Sys.sleep(1), mc.cores=8), replications=1)
|                                                       test replications elapsed
| relative user.self sys.self user.child sys.child
|    1               mclapply(1:8, function(x) Sys.sleep(1))            1  
| 4.013  3.89612     0.000    0.008      0.000     0.004
|    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8)            1  
| 1.030  1.00000     0.004    0.008      0.004     0.000
|    R>
| 
| and
| 
|    R> library(microbenchmark)
|    R> microbenchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8,
| function(x) Sys.sleep(1), mc.cores=8), times=1)
|    Unit: seconds
|                                                       expr     min      lq 
| median      uq     max
|    1               mclapply(1:8, function(x) Sys.sleep(1)) 4.01377 4.01377
| 4.01377 4.01377 4.01377
|    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1.03457 1.03457
| 1.03457 1.03457 1.03457
|    R>
| 
| (and you normally want to run either with 10 or 100 or ... replications /
| times).
| 
| Dirk
| 
| --
| Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| 
| _______________________________________________
| R-sig-hpc mailing list
| R-sig-hpc at r-project.org
| https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the R-sig-hpc mailing list