[Bioc-devel] BiocParallel

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 15 00:50:36 CET 2012


On 11/14/2012 03:43 PM, Ryan C. Thompson wrote:
> Here are two alternative implementations of pvec. pvec2 is just a simple rewrite
> of pvec to use mclapply. pvec3 then extends pvec2 to accept a specified chunk
> size or a specified number of chunks. If the number of chunks exceeds the number
> of cores, then multiple chunks will get run sequentially on each core. pvec3
> also exposes the "mc.prescheule" argument of mclapply, since that is relevant
> when there are more chunks than cores. Lastly, I provide a "pvectorize" function
> which can be called on a regular vectorized function to make it into a pvec'd
> version of itself. Usage is like: sqrt.parallel <- pvectorize(sqrt);
> sqrt.parallel(1:1000).
>
> pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
>                    mc.cores = getOption("mc.cores", 2L), mc.cleanup = TRUE)
> {
>    env <- parent.frame()
>    cores <- as.integer(mc.cores)
>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>    if(cores == 1L) return(FUN(v, ...))
>
>    if(mc.set.seed) mc.reset.stream()
>
>    n <- length(v)
>    si <- splitIndices(n, cores)
>    res <- do.call(c,
>                   mclapply(si, function(i) FUN(v[i], ...),
>                            mc.set.seed=mc.set.seed,
>                            mc.silent=mc.silent,
>                            mc.cores=mc.cores,
>                            mc.cleanup=mc.cleanup))
>    if (length(res) != n)
>      warning("some results may be missing, folded or caused an error")
>    res
> }
> pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
>                    mc.cores = getOption("mc.cores", 2L), mc.cleanup = TRUE,
>                    mc.preschedule=FALSE, num.chunks, chunk.size)
> {
>    env <- parent.frame()
>    cores <- as.integer(mc.cores)
>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>    if(cores == 1L) return(FUN(v, ...))
>
>    if(mc.set.seed) mc.reset.stream()
>
>    n <- length(v)
>    if (missing(num.chunks)) {
>      if (missing(chunk.size)) {
>        num.chunks <- cores
>      } else {
>        num.chunks <- ceiling(n/chunk.size)
>      }
>    }
>    si <- splitIndices(n, num.chunks)
>    res <- do.call(c,
>                   mclapply(si, function(i) FUN(v[i], ...),
>                            mc.set.seed=mc.set.seed,
>                            mc.silent=mc.silent,
>                            mc.cores=mc.cores,
>                            mc.cleanup=mc.cleanup,
>                            mc.preschedule=mc.preschedule))
>    if (length(res) != n)
>      warning("some results may be missing, folded or caused an error")
>    res
> }
>
> pvectorize <- function(FUN) {
>    function(...) pvec3(FUN=FUN, ...)
> }

would be great to have these as 'pull' requests in github; pvec3 as a 
replacement for pvec, if it's implementing the same concept but better.

Unit tests would be good (yes being a little hypocritical). inst/unitTests, 
using RUnit, examples in

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/IRanges/inst/unitTests

with username / password readonly

Martin

> On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote:
>>
>> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>
>>>
>>> Interested developers -- I added the start of a BiocParallel package to
>>> the Bioconductor subversion repository and build system.
>>>
>>> The package is mirrored on github to allow for social coding; I encourage
>>> people to contribute via that mechanism.
>>>
>>> https://github.com/**Bioconductor/BiocParallel<https://github.com/Bioconductor/BiocParallel>
>>>
>>>
>>> The purpose is to help focus our efforts at developing appropriate
>>> parallel paradigms. Currently the package Imports: parallel and implements
>>> pvec and mclapply in a way that allows for operation on any vector or list
>>> supporting length(), [, and [[ (the latter for mclapply). pvec in
>>> particular seems to be appropriate for GRanges-like objects, where we don't
>>> necessarily want to extract many thousands of S4 instances of individual
>>> ranges with [[.
>>>
>>
>>
>> Makes sense. Besides, [[ does not even work on GRanges. One limitation of
>> pvec is that it does not support a chunk size; it just uses length(x) /
>> ncores. It would be nice to be able to restrict that, which would then
>> require multiple jobs per core. Unless I'm missing something.
>>
>>
>>>
>>>
>>> Hopefully the ideas in the package can be folded back in to parallel as
>>> they mature.
>>>
>>> Martin
>>> --
>>> Dr. Martin Morgan, PhD
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> ______________________________**_________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list