[R-sig-hpc] The foreach, iterators and doMC packages

Steve Weston steve at revolution-computing.com
Thu Jul 16 23:58:44 CEST 2009


On Thu, Jul 16, 2009 at 1:34 PM, Mark Kimpel<mwkimpel at gmail.com> wrote:
> Steve,
>
> I used a non-trivial example and got a nice speed boost, so thanks for
> that advice.

That's good to hear.

> Now I need advice on a real-world application. I am working with a
> genomic data-set that involves a lot of calculations on a data-frame
> as well as parallel sub-settting of an annotation data-frame. The
> data.frames initially have about 30k rows and up to 100 columns. If,
> for example, I have 5 cores to work with, it seems to me that the most
> efficient way, rather than making repeated calls to the cores, would
> be to parcel things out by the number of rows divided by the number of
> cores. That would mean sending data.frames to the functions that
> do.par calls rather than vectors. Below is some a self-contained
> example. It doesn't work because i doesn't get incremented after the
> %dopar% operator.

Your basic strategy seems reasonable, but I think there's a problem in
the way that you're indexing into the matrices x.mat and y.mat.
I would do it this way:

do.par.test.func <- function(Nrow, Ncol, Ncore){
 x.mat <- matrix(rnorm(Ncol * Nrow), Nrow, Ncol)
 y.mat <- matrix(rnorm(Ncol * Nrow), Nrow, Ncol)
 N <- ceiling(Nrow/Ncore)
 foreach(i = 0:(Ncore-1), .combine='rbind') %dopar%
   do.par.test.called.func(x.mat[(i * N + 1):((i+1) * N),], y.mat[(i *
N + 1):((i+1) * N),])
}

I also threw in the use of the .combine option to rbind the submatrices into
the full matrix.   That seems preferable to returning a list of submatrices.

I'm not sure what you mean by i not being incremented.  Let me know
if this change to the indexing doesn't fix the code.

- Steve



More information about the R-sig-hpc mailing list