[R-sig-hpc] Parallel linear model
Simon Urbanek
simon.urbanek at r-project.org
Thu Aug 23 03:06:45 CEST 2012
On Aug 22, 2012, at 7:18 PM, Norm Matloff wrote:
> On Wed, Aug 22, 2012 at 06:03:36PM -0500, Paul Johnson wrote:
>
>> This is a great example and I would like to use it in class. But I
>> think I don't understand the implications of the system.time output
>> you get. I have a question about this below. Would you share your
>> thoughts?...
>
> Paul is bringing up a very important point here.
>
> There are various OS dependencies that can really change things. A
> notable example is that if one calls something like mclapply(), the time
> actually spent by the child R processes probably will NOT be counted in
> the User time.
That is actually wrong. It is true for snow where the processes are separate, but most systems do account for child user time in mclapply:
# Linux
> system.time(mclapply(1:32, function(x) for(i in 1:1e6) x+x, mc.cores=32))
user system elapsed
27.330 1.468 0.944
> system.time((function(x) for(i in 1:1e6) x+x)(1))
user system elapsed
0.736 0.000 0.734
# OS X
> system.time(mclapply(1:16, function(x) for(i in 1:1e6) x+x, mc.cores=16))
user system elapsed
9.386 0.357 0.876
> system.time((function(x) for(i in 1:1e6) x+x)(1))
user system elapsed
0.425 0.004 0.428
Cheers,
Simon
> The latter will likely just measure how much time the
> parent process spend in parceling out the work to the children, and in
> collecting together the results.
>
> You have the same problem on a cluster, where the worker processes set
> up by clusterApply() or whatever aren't counted.
>
> You could on the other hand have the opposite problem in some OSes,
> where once gets the SUM of the times of the children.
>
> Using Elapsed time might be a little crude, but generally good enough.
>
> Norm
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
>
More information about the R-sig-hpc
mailing list