[R-sig-hpc] Parallel linear model

Fri Aug 24 08:31:43 CEST 2012

Thanks for pointing me to microbenchmark, Dirk.  I had not been aware of
it, and it definitely looks useful.  Among other things, it uses the
clock_gettime() function I mentioned yesterday, so this now would give
me a convenient wrapper for it when I'm working at the R level.

However, I should elaborate on something I said yesterday.  Recall that
on the one hand I felt that Elapsed time from system.time() is usually
good enough, but on the other hand I said that if one needs really fine
timing, one needs to see the times of the individual threads.  I
mentioned makespan but really should have also mentioned the issue of
load balancing.

Even though the end user arguably cares mainly about Elapsed time,
during development of a parallel algorithm one needs to identify sources
of slowdown.  That of course leads to investigating load imbalance
(among other things).  

So in finely-detailed timing experiments, one needs to know the
individual thread times.  Unfortunately, microbenchmark doesn't seem to
provide this.

Of course, the user on his own could insert code to call, say
proc.time() twice within each thread, and return the difference to the
parent, which could then separately report the times.  

But it would be nice to automate this.  It would seem to be fairly easy
to incorporate such timing (optional to the caller) within the various
snow functions, and probably so for something like mcapply() as well.

Norm

On Thu, Aug 23, 2012 at 08:14:50AM -0500, Dirk Eddelbuettel wrote:
> 
> On 22 August 2012 at 23:22, Norm Matloff wrote:
> | 
> | In rereading your posting now, Dirk, I suddenly realized that there is
> | one aspect of this that I'd forgotten about:  An ordinary call to
> | system.time() does not display all the information returned by that
> | function!
> | 
> | That odd statement is of course due to the fact that the print
> | method for objects of class proc_time displays only 3 of the 5 numbers.
> | If one actually looks at the 5 numbers individually, you can separate
> | the time of the parent process from the sum of the child times.  That
> | separation is apparently what rbenchmark gives you, right?
> | 
> | As I said earlier, the quick-and-dirty way to handle this is to use the
> | Elapsed time, typically good enough (say on a dedicated machine).  After
> | all, if we are trying to develop a fast parallel algorithm, what the
> | potential users of the algorithm care about is essentially the Elapsed
> | time.
> 
> That seems fair in most cases.
> 
> | But at the other extreme, a very fine timing goal might be to try to
> | compute what is called the makespan, which in this case would be the
> | maximum of all the child times, rather than the sum of the child times.
> | I say "try," because I don't see any systems way to accomplish this,
> | short of inserting calls to something like clock_gettime() inside each
> | thread.
> 
> Maybe you could look at what microbenchmark does [ as it covers all the
> OS-level dirty work ] and see if it generalizes to multiple machines?
> 
> Dirk
>  
> | Norm
> | 
> | On Wed, Aug 22, 2012 at 07:53:02PM -0500, Dirk Eddelbuettel wrote:
> | > 
> | > The difference between user and elapsed is an old hat. Here is a great
> | > example (and IIRC first shown here by Simon) with no compute time:
> | > 
> | >    R> system.time(mclapply(1:8, function(x) Sys.sleep(1)))   ## 2 cores by default
> | >       user  system elapsed 
> | >      0.000   0.012   4.014 
> | >    R> system.time(mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8))
> | >       user  system elapsed 
> | >      0.012   0.020   1.039 
> | >    R> 
> | > 
> | > so elapsed time is effectively the one second a Sys.sleep(1) takes, plus
> | > overhead, if we allow for all eight (hyperthreaded) cores here.  By Brian
> | > Ripley's choice a default of two is baked-in, so clueless users only get a
> | > small gain.  "user time" is roughly the actual system load _summed over all
> | > processes / threads_.
> | > 
> | > With that, could I ask any of the participants in the thread to re-try with a
> | > proper benchmarking package such as rbenchmark or microbenchmark?  Either one
> | > beats to the socks of system.time:
> | > 
> | >    R> library(rbenchmark)
> | >    R> benchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), replications=1)
> | >                                                       test replications elapsed relative user.self sys.self user.child sys.child
> | >    1               mclapply(1:8, function(x) Sys.sleep(1))            1   4.013  3.89612     0.000    0.008      0.000     0.004
> | >    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8)            1   1.030  1.00000     0.004    0.008      0.004     0.000
> | >    R> 
> | > 
> | > and
> | > 
> | >    R> library(microbenchmark)
> | >    R> microbenchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), times=1)
> | >    Unit: seconds
> | >                                                       expr     min      lq  median      uq     max
> | >    1               mclapply(1:8, function(x) Sys.sleep(1)) 4.01377 4.01377 4.01377 4.01377 4.01377
> | >    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1.03457 1.03457 1.03457 1.03457 1.03457
> | >    R> 
> | > 
> | > (and you normally want to run either with 10 or 100 or ... replications /
> | > times).
> | > 
> | > Dirk
> | > 
> | > -- 
> | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
> | > 
> | > _______________________________________________
> | > R-sig-hpc mailing list
> | > R-sig-hpc at r-project.org
> | > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> | 
> | _______________________________________________
> | R-sig-hpc mailing list
> | R-sig-hpc at r-project.org
> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> -- 
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com