[R] for loop Vs apply function Vs foreach (REvolution enhancement)

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Feb 16 17:24:46 CET 2010


Hi,

> 2. foreach (REvolution enhancement)
>
>seems the rationale of this function is to facilitate the use of multithreading to enhance the for loop speed. Given a moderate time sensitivity (process must run fast but a gain of 10-20% speed seen as probably not justifying the additional learning + dependence from yet another package), is it really worth going down that route?
>
> Has anyone extensive experience with this matter (using foreach to boost for loop running time)? any feedback welcome.

I'm not sure what you mean by "moderate time sensitivity" notion, but
you should definitely use foreach if you have a block of code that you
are iterating over that (i) takes a moderately long time to execute;
(ii) is independent of the code that runs before/after it in the loop;
and (tangentially but not really pertinent) (iii) running a linux/os x
machine so you can use the multicore package.

There isn't much learning involved since parallelizing over the cpu's
of a single machine is pretty much painless as long as you satisfy
(iii) above. This is only because the last I heard the "multicore"
package (which foreach/doMC depends on) doesn't really work on
windows. For instance, instead of something like:

results <- lapply(1:100, function(x) doSomethingWith(x))

or:

results <- list()
for (x in 1:100) {
  results[[x]] <- doSomethingWith(x)
}

You do:

results <- foreach(x=1:100) %dopar% {
  doSomethingWith(x)
}

That having been said, I wouldn't use foreach all the time as a
"default" replacement for the normal/sequential "for" loop, because
there is some rigging involved in using it, and it might not be worth
it if the code you are iterating over isn't too heavy.

Another nice thing is that the foreach process "degrades" gracefully.
For instance, if you are running on a machine that doesn't have any
foreach backend packages installed/enabled (the backend package
determines the "parallelization strategy", eg: "doMC" is a foreach
backend that parallelizes over the cpus/cores of 1 machine, others
parallelize over different machines in a cluster), then it will just
run the code in the %dopar% block sequentially.

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list