[R-sig-hpc] The foreach, iterators and doMC packages
Steve Weston
steve at revolution-computing.com
Thu Jul 16 23:58:44 CEST 2009
On Thu, Jul 16, 2009 at 1:34 PM, Mark Kimpel<mwkimpel at gmail.com> wrote:
> Steve,
>
> I used a non-trivial example and got a nice speed boost, so thanks for
> that advice.
That's good to hear.
> Now I need advice on a real-world application. I am working with a
> genomic data-set that involves a lot of calculations on a data-frame
> as well as parallel sub-settting of an annotation data-frame. The
> data.frames initially have about 30k rows and up to 100 columns. If,
> for example, I have 5 cores to work with, it seems to me that the most
> efficient way, rather than making repeated calls to the cores, would
> be to parcel things out by the number of rows divided by the number of
> cores. That would mean sending data.frames to the functions that
> do.par calls rather than vectors. Below is some a self-contained
> example. It doesn't work because i doesn't get incremented after the
> %dopar% operator.
Your basic strategy seems reasonable, but I think there's a problem in
the way that you're indexing into the matrices x.mat and y.mat.
I would do it this way:
do.par.test.func <- function(Nrow, Ncol, Ncore){
x.mat <- matrix(rnorm(Ncol * Nrow), Nrow, Ncol)
y.mat <- matrix(rnorm(Ncol * Nrow), Nrow, Ncol)
N <- ceiling(Nrow/Ncore)
foreach(i = 0:(Ncore-1), .combine='rbind') %dopar%
do.par.test.called.func(x.mat[(i * N + 1):((i+1) * N),], y.mat[(i *
N + 1):((i+1) * N),])
}
I also threw in the use of the .combine option to rbind the submatrices into
the full matrix. That seems preferable to returning a list of submatrices.
I'm not sure what you mean by i not being incremented. Let me know
if this change to the indexing doesn't fix the code.
- Steve
More information about the R-sig-hpc
mailing list