[R-sig-hpc] Parallel computing with snow

luke at stat.uiowa.edu luke at stat.uiowa.edu
Sun Jan 4 21:28:13 CET 2009


parApply should be dividing the data in roughly equal chunks.  If you
get unequal usage that most likely reflects different performance of
your 'func1' and 'func2' on these chunks, assuming there is enough
data for 4 chunks. It is impossible kow more specifically "what gives"
without knowing more about 'mData', 'func1, and 'func2'.

luke

On Fri, 2 Jan 2009, Gang Chen wrote:

> I've been using parApply() in snow package for parallel computing with
> the following lines in R 2.8.1:
>
>   library(snow)
>   nNodes <- 4
>   cl <- makeCluster(nNodes, type = "SOCK")
>   fm <- parApply(cl, myData, c(1,2), func1, ...)
>
> Since I have a Mac OS X (version 10.4.11) with two dual-core
> processors, I thought that I could run 4 simultaneous clusters.
> However with the 1st job it seems only two clusters (362 and 364
> below) were running with roughly the same CPU time (4th column) while
> the other two clusters were pretty much idling (I assume the 1st row
> with PID 357 was the main process with which I started R):
>
>  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
>  357         R             0.0%       0:15.81   1    20   171   128M
> 5.66M   137M   169M
>  362         R            99.8%      11:41.07   1    19   129  28.8M
> 5.66M  38.3M  64.7M
>  364         R           100.3%      12:26.43   1    19   129  28.5M
> 5.66M  38.0M  64.7M
>  366         R             0.0%       0:01.67   1    19   120  23.7M
> 4.88M  32.3M  61.2M
>  368         R             0.0%       0:01.68   1    19   120  23.7M
> 4.88M  32.3M  61.2M
>
> Why weren't 4 clusters split roughly equally in CPU time with two barely used?
>
> I also tried a different job with fm <- parApply(cl, myData, c(1,2),
> func2, ...), and the result is slightly different with all 4 clusters
> more or less involved although they were still not distributed evenly
> neither:
>
>  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
>  413          R            0.0%       0:18.46   1    20   119   221M
> 4.57M   231M   250M
>  419          R           93.3%       2:53.62   1    19    80  18.0M
> 4.57M  29.1M  51.2M
>  421          R           93.6%       6:07.85   1    19    79  15.9M
> 4.57M  26.9M  50.2M
>  423          R           92.8%       5:12.13   1    19    79  17.4M
> 4.57M  28.4M  50.2M
>  425          R           93.3%       1:39.73   1    19    82  20.0M
> 4.57M  32.9M  53.2M
>
> What gives? Why different usage of clusters between the two jobs?
>
> All help is highly appreciated,
> Gang
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-sig-hpc mailing list