[R] parallel computation with plyr 1.2.1
Dylan Beaudette
debeaudette at ucdavis.edu
Thu Sep 16 19:11:55 CEST 2010
Hi,
I have been trying to use the new .parallel argument with the most recent
version of plyr [1] to speed up some tasks. I can run the example in the NEWS
file [1], and it seems to be working correctly. However, R will only use a
single core when I try to apply this same approach with ddply().
1. http://cran.r-project.org/web/packages/plyr/NEWS
Watching my CPUs I see that in both cases only a single core is used, and they
take about the same amount of time. Is there a limitation with how ddply()
dispatches parallel jobs, or is this task not suitable for parallel
computing?
Cheers,
Dylan
Here is an example:
library(plyr)
library(doMC)
registerDoMC(cores=2)
# example data
d <- data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500))
# function that wastes some time
f <- function(x) {
m <- vector(length=10000)
for(i in 1:10000) {
m[i] <- mean(sample(x$y, 100))
}
mean(m)
}
system.time(ddply(d, .(id), .fun=f, .parallel=FALSE))
# user system elapsed
# 2.740 0.016 2.766
system.time(ddply(d, .(id), .fun=f, .parallel=TRUE))
# user system elapsed
# 2.720 0.000 2.726
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341
More information about the R-help
mailing list