[R-sig-hpc] The foreach, iterators and doMC packages

Brian G. Peterson brian at braverock.com
Wed Jul 15 19:18:34 CEST 2009


Mark,

Use a non-trivial example.  sqrt() will probably have more overhead than 
is worth it to distribute.  "there is no such thing as a free lunch" is 
as true in parallel computing as anywhere else, and the overhead of 
distributing the job will overwhelm a trivial example.  Sys.sleep(10) 
would be a more worthwhile trivial test case.

Regards,

  - Brian

Mark Kimpel wrote:
> I'm having trouble getting these packages to work as I believe they
> should. I have a new Debian Lenny box running on an Intel i7 with 12
> GB of memory and wrote a test script to see how much performance
> increase I could achieve with foreach. For this purpose, I used code
> extracted from the vignette. Below is the code, the system.time
> output, and sessionInfo(). I should add that I've run this multiple
> times and always achieved similar results. These last were achieved
> after doing init 1 to get me into a strict terminal mode and avoid the
> GUI.
>
> As you can see, the results seem to be the opposite of what one would
> expect. The fastest time, by an order of magnitude, is achieved by a
> simple for loop, and %do% slightly outperforms %dopar%
>
> How can this be explained?
> Mark
> ####################################
> require(foreach)
> require(multicore)
> require(doMC)
> registerDoMC(cores=5)
> z <- 30000
>
>
> for.each.do.time <- system.time(
>             a <- foreach(i = 1:z, .combine = "c") %do% sqrt(i)
>             )
> #
> for.each.do.par.time <- system.time(
>             b <- foreach(i = 1:z, .combine = "c") %dopar% sqrt(i)
>             )
>
> #
> c <- rep(0,z)
> loop.time <- system.time(
>             for (i in 1:length(c))
>             c[i] <- sqrt(i)
>             )
> #
> out <- rbind(unclass(for.each.do.time), unclass(for.each.do.par.time),
> unclass(loop.time))
> out
> "user.self"	"sys.self"	"elapsed"	"user.child"	"sys.child"
> 25.713	0	25.712	0	0
> 25.918	0.016	26.015	0.192	0.192
> 0.207999999999998	0	0.206000000000003	0	0
> #
> "R.version.platform" "x86_64-unknown-linux-gnu"
> "R.version.arch" "x86_64"
> "R.version.os" "linux-gnu"
> "R.version.system" "x86_64, linux-gnu"
> "R.version.status" ""
> "R.version.major" "2"
> "R.version.minor" "9.1"
> "R.version.year" "2009"
> "R.version.month" "06"
> "R.version.day" "26"
> "R.version.svn rev" "48839"
> "R.version.language" "R"
> "R.version.version.string" "R version 2.9.1 (2009-06-26)"
> "locale" "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
> "basePkgs1" "stats"
> "basePkgs2" "graphics"
> "basePkgs3" "grDevices"
> "basePkgs4" "datasets"
> "basePkgs5" "utils"
> "basePkgs6" "methods"
> "basePkgs7" "base"
>
>
>
> ------------------------------------------------------------
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
>
> "The real problem is not whether machines think but whether men do."
> -- B. F. Skinner
> ******************************************************************
>
>
>
> On Wed, Jul 1, 2009 at 10:56 PM, Steve
> Weston<steve at revolution-computing.com> wrote:
>   
>> There have been several announcements of three new packages that I've
>> recently uploaded to CRAN: foreach, iterators, and doMC.  You can read
>> one description on David Smith's blog, at:
>>
>>    http://blog.revolution-computing.com
>> or:
>>    http://bit.ly/tygLz
>>
>> You can also read the vignette that I wrote for foreach on a CRAN
>> website.
>>
>> I would like to mention that one of the goals of the foreach package is
>> to make it easy to write an R package that allows the end user to choose
>> what parallel computing engine to use.  That's useful because the user
>> may already have and use a parallel computing system, and not want to
>> install and maintain yet another one.  (The foreach and iterators packages
>> themselves are trivial to install, since they provide a framework for using
>> parallel computing systems such as multicore and nws.)
>>
>> Currently, only the doMC "parallel backend" for multicore is publicly
>> available on CRAN, but I'm hoping to get a chance to write and release
>> other backend packages, to support MPI and shared memory, for example.
>>
>> I'd love to hear from R package authors on how to improve foreach
>> so that it's a more attractive platform on which to develop parallel
>> applications.  Especially if you've had difficulty using parallel
>> computing systems in the past.
>>
>>     

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock



More information about the R-sig-hpc mailing list