[R-sig-hpc] Parallel linear model

Fri Aug 24 08:56:13 CEST 2012

If you're on Linux on bare metal, you can do all sorts of nifty
micro-benchmarking via Sysstat, Oprofile and Blktrace. It's been a few
years since I did any of this, but I probably still have some
hacked-up scripts on Github for Oprofile (CPU bottlenecks), Blktrace
(disk bottlenecks) and Sysstat (telling which one you have).

On Thu, Aug 23, 2012 at 11:31 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
> Thanks for pointing me to microbenchmark, Dirk.  I had not been aware of
> it, and it definitely looks useful.  Among other things, it uses the
> clock_gettime() function I mentioned yesterday, so this now would give
> me a convenient wrapper for it when I'm working at the R level.
>
> However, I should elaborate on something I said yesterday.  Recall that
> on the one hand I felt that Elapsed time from system.time() is usually
> good enough, but on the other hand I said that if one needs really fine
> timing, one needs to see the times of the individual threads.  I
> mentioned makespan but really should have also mentioned the issue of
> load balancing.
>
> Even though the end user arguably cares mainly about Elapsed time,
> during development of a parallel algorithm one needs to identify sources
> of slowdown.  That of course leads to investigating load imbalance
> (among other things).
>
> So in finely-detailed timing experiments, one needs to know the
> individual thread times.  Unfortunately, microbenchmark doesn't seem to
> provide this.
>
> Of course, the user on his own could insert code to call, say
> proc.time() twice within each thread, and return the difference to the
> parent, which could then separately report the times.
>
> But it would be nice to automate this.  It would seem to be fairly easy
> to incorporate such timing (optional to the caller) within the various
> snow functions, and probably so for something like mcapply() as well.
>
> Norm
>
> On Thu, Aug 23, 2012 at 08:14:50AM -0500, Dirk Eddelbuettel wrote:
>>
>> On 22 August 2012 at 23:22, Norm Matloff wrote:
>> |
>> | In rereading your posting now, Dirk, I suddenly realized that there is
>> | one aspect of this that I'd forgotten about:  An ordinary call to
>> | system.time() does not display all the information returned by that
>> | function!
>> |
>> | That odd statement is of course due to the fact that the print
>> | method for objects of class proc_time displays only 3 of the 5 numbers.
>> | If one actually looks at the 5 numbers individually, you can separate
>> | the time of the parent process from the sum of the child times.  That
>> | separation is apparently what rbenchmark gives you, right?
>> |
>> | As I said earlier, the quick-and-dirty way to handle this is to use the
>> | Elapsed time, typically good enough (say on a dedicated machine).  After
>> | all, if we are trying to develop a fast parallel algorithm, what the
>> | potential users of the algorithm care about is essentially the Elapsed
>> | time.
>>
>> That seems fair in most cases.
>>
>> | But at the other extreme, a very fine timing goal might be to try to
>> | compute what is called the makespan, which in this case would be the
>> | maximum of all the child times, rather than the sum of the child times.
>> | I say "try," because I don't see any systems way to accomplish this,
>> | short of inserting calls to something like clock_gettime() inside each
>> | thread.
>>
>> Maybe you could look at what microbenchmark does [ as it covers all the
>> OS-level dirty work ] and see if it generalizes to multiple machines?
>>
>> Dirk
>>
>> | Norm
>> |
>> | On Wed, Aug 22, 2012 at 07:53:02PM -0500, Dirk Eddelbuettel wrote:
>> | >
>> | > The difference between user and elapsed is an old hat. Here is a great
>> | > example (and IIRC first shown here by Simon) with no compute time:
>> | >
>> | >    R> system.time(mclapply(1:8, function(x) Sys.sleep(1)))   ## 2 cores by default
>> | >       user  system elapsed
>> | >      0.000   0.012   4.014
>> | >    R> system.time(mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8))
>> | >       user  system elapsed
>> | >      0.012   0.020   1.039
>> | >    R>
>> | >
>> | > so elapsed time is effectively the one second a Sys.sleep(1) takes, plus
>> | > overhead, if we allow for all eight (hyperthreaded) cores here.  By Brian
>> | > Ripley's choice a default of two is baked-in, so clueless users only get a
>> | > small gain.  "user time" is roughly the actual system load _summed over all
>> | > processes / threads_.
>> | >
>> | > With that, could I ask any of the participants in the thread to re-try with a
>> | > proper benchmarking package such as rbenchmark or microbenchmark?  Either one
>> | > beats to the socks of system.time:
>> | >
>> | >    R> library(rbenchmark)
>> | >    R> benchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), replications=1)
>> | >                                                       test replications elapsed relative user.self sys.self user.child sys.child
>> | >    1               mclapply(1:8, function(x) Sys.sleep(1))            1   4.013  3.89612     0.000    0.008      0.000     0.004
>> | >    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8)            1   1.030  1.00000     0.004    0.008      0.004     0.000
>> | >    R>
>> | >
>> | > and
>> | >
>> | >    R> library(microbenchmark)
>> | >    R> microbenchmark( mclapply(1:8, function(x) Sys.sleep(1)), mclapply(1:8, function(x) Sys.sleep(1), mc.cores=8), times=1)
>> | >    Unit: seconds
>> | >                                                       expr     min      lq  median      uq     max
>> | >    1               mclapply(1:8, function(x) Sys.sleep(1)) 4.01377 4.01377 4.01377 4.01377 4.01377
>> | >    2 mclapply(1:8, function(x) Sys.sleep(1), mc.cores = 8) 1.03457 1.03457 1.03457 1.03457 1.03457
>> | >    R>
>> | >
>> | > (and you normally want to run either with 10 or 100 or ... replications /
>> | > times).
>> | >
>> | > Dirk
>> | >
>> | > --
>> | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>> | >
>> | > _______________________________________________
>> | > R-sig-hpc mailing list
>> | > R-sig-hpc at r-project.org
>> | > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> |
>> | _______________________________________________
>> | R-sig-hpc mailing list
>> | R-sig-hpc at r-project.org
>> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>> --
>> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

-- 
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
Workbench: http://j.mp/QCsXOr

How the Hell can the lion sleep with all those people singing "A weem
oh way!" at the top of their lungs?