[R-sig-hpc] Parallel linear model

Thu Aug 23 08:22:21 CEST 2012

In rereading your posting now, Dirk, I suddenly realized that there is
one aspect of this that I'd forgotten about:  An ordinary call to
system.time() does not display all the information returned by that
function!

That odd statement is of course due to the fact that the print
method for objects of class proc_time displays only 3 of the 5 numbers.
If one actually looks at the 5 numbers individually, you can separate
the time of the parent process from the sum of the child times.  That
separation is apparently what rbenchmark gives you, right?

As I said earlier, the quick-and-dirty way to handle this is to use the
Elapsed time, typically good enough (say on a dedicated machine).  After
all, if we are trying to develop a fast parallel algorithm, what the
potential users of the algorithm care about is essentially the Elapsed
time.

But at the other extreme, a very fine timing goal might be to try to
compute what is called the makespan, which in this case would be the
maximum of all the child times, rather than the sum of the child times.
I say "try," because I don't see any systems way to accomplish this,
short of inserting calls to something like clock_gettime() inside each
thread.

Norm

On Wed, Aug 22, 2012 at 07:53:02PM -0500, Dirk Eddelbuettel wrote:
> 
> The difference between user and elapsed is an old hat. Here is a great
> example (and IIRC first shown here by Simon) with no compute time:
> 
>    R> system.time(mclapply(1:8, function(x) Sys.sleep(1)))   ## 2 cores by default
>       user  system elapsed 
>      0.000   0.012   4.014 
>    R> system.time(mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8))
>       user  system elapsed 
>      0.012   0.020   1.039 
>    R> 
> 
> so elapsed time is effectively the one second a Sys.sleep(1) takes, plus
> overhead, if we allow for all eight (hyperthreaded) cores here.  By Brian
> Ripley's choice a default of two is baked-in, so clueless users only get a
> small gain.  "user time" is roughly the actual system load _summed over all
> processes / threads_.
> 
> With that, could I ask any of the participants in the thread to re-try with a
> proper benchmarking package such as rbenchmark or microbenchmark?  Either one
> beats to the socks of system.time:
> 
>    R> library(rbenchmark)
>    R> benchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), replications=1)
>                                                       test replications elapsed relative user.self sys.self user.child sys.child
>    1               mclapply(1:8, function(x) Sys.sleep(1))            1   4.013  3.89612     0.000    0.008      0.000     0.004
>    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8)            1   1.030  1.00000     0.004    0.008      0.004     0.000
>    R> 
> 
> and
> 
>    R> library(microbenchmark)
>    R> microbenchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), times=1)
>    Unit: seconds
>                                                       expr     min      lq  median      uq     max
>    1               mclapply(1:8, function(x) Sys.sleep(1)) 4.01377 4.01377 4.01377 4.01377 4.01377
>    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1.03457 1.03457 1.03457 1.03457 1.03457
>    R> 
> 
> (and you normally want to run either with 10 or 100 or ... replications /
> times).
> 
> Dirk
> 
> -- 
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc