[R-sig-hpc] Parallel computing with snow
luke at stat.uiowa.edu
luke at stat.uiowa.edu
Sun Jan 4 21:28:13 CET 2009
parApply should be dividing the data in roughly equal chunks. If you
get unequal usage that most likely reflects different performance of
your 'func1' and 'func2' on these chunks, assuming there is enough
data for 4 chunks. It is impossible kow more specifically "what gives"
without knowing more about 'mData', 'func1, and 'func2'.
luke
On Fri, 2 Jan 2009, Gang Chen wrote:
> I've been using parApply() in snow package for parallel computing with
> the following lines in R 2.8.1:
>
> library(snow)
> nNodes <- 4
> cl <- makeCluster(nNodes, type = "SOCK")
> fm <- parApply(cl, myData, c(1,2), func1, ...)
>
> Since I have a Mac OS X (version 10.4.11) with two dual-core
> processors, I thought that I could run 4 simultaneous clusters.
> However with the 1st job it seems only two clusters (362 and 364
> below) were running with roughly the same CPU time (4th column) while
> the other two clusters were pretty much idling (I assume the 1st row
> with PID 357 was the main process with which I started R):
>
> PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE
> 357 R 0.0% 0:15.81 1 20 171 128M
> 5.66M 137M 169M
> 362 R 99.8% 11:41.07 1 19 129 28.8M
> 5.66M 38.3M 64.7M
> 364 R 100.3% 12:26.43 1 19 129 28.5M
> 5.66M 38.0M 64.7M
> 366 R 0.0% 0:01.67 1 19 120 23.7M
> 4.88M 32.3M 61.2M
> 368 R 0.0% 0:01.68 1 19 120 23.7M
> 4.88M 32.3M 61.2M
>
> Why weren't 4 clusters split roughly equally in CPU time with two barely used?
>
> I also tried a different job with fm <- parApply(cl, myData, c(1,2),
> func2, ...), and the result is slightly different with all 4 clusters
> more or less involved although they were still not distributed evenly
> neither:
>
> PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE
> 413 R 0.0% 0:18.46 1 20 119 221M
> 4.57M 231M 250M
> 419 R 93.3% 2:53.62 1 19 80 18.0M
> 4.57M 29.1M 51.2M
> 421 R 93.6% 6:07.85 1 19 79 15.9M
> 4.57M 26.9M 50.2M
> 423 R 92.8% 5:12.13 1 19 79 17.4M
> 4.57M 28.4M 50.2M
> 425 R 93.3% 1:39.73 1 19 82 20.0M
> 4.57M 32.9M 53.2M
>
> What gives? Why different usage of clusters between the two jobs?
>
> All help is highly appreciated,
> Gang
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-sig-hpc
mailing list