[Bioc-devel] BiocParallel

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Nov 15 15:21:28 CET 2012


I'll second Ryan's patch (at least in principle).  When I parallelize
across multiple cores, I have always found mc.preschedule to be an
important option to expose (that, and the number of cores, is all I
use routinely).

Kasper

On Wed, Nov 14, 2012 at 7:14 PM, Ryan C. Thompson <rct at thompsonclan.org> wrote:
> I just submitted a pull request. I'll add tests shortly if I can figure out
> how to write them.
>
>
> On Wed 14 Nov 2012 03:50:36 PM PST, Martin Morgan wrote:
>>
>> On 11/14/2012 03:43 PM, Ryan C. Thompson wrote:
>>>
>>> Here are two alternative implementations of pvec. pvec2 is just a
>>> simple rewrite
>>> of pvec to use mclapply. pvec3 then extends pvec2 to accept a
>>> specified chunk
>>> size or a specified number of chunks. If the number of chunks exceeds
>>> the number
>>> of cores, then multiple chunks will get run sequentially on each
>>> core. pvec3
>>> also exposes the "mc.prescheule" argument of mclapply, since that is
>>> relevant
>>> when there are more chunks than cores. Lastly, I provide a
>>> "pvectorize" function
>>> which can be called on a regular vectorized function to make it into
>>> a pvec'd
>>> version of itself. Usage is like: sqrt.parallel <- pvectorize(sqrt);
>>> sqrt.parallel(1:1000).
>>>
>>> pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
>>>                    mc.cores = getOption("mc.cores", 2L), mc.cleanup =
>>> TRUE)
>>> {
>>>    env <- parent.frame()
>>>    cores <- as.integer(mc.cores)
>>>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>    if(cores == 1L) return(FUN(v, ...))
>>>
>>>    if(mc.set.seed) mc.reset.stream()
>>>
>>>    n <- length(v)
>>>    si <- splitIndices(n, cores)
>>>    res <- do.call(c,
>>>                   mclapply(si, function(i) FUN(v[i], ...),
>>>                            mc.set.seed=mc.set.seed,
>>>                            mc.silent=mc.silent,
>>>                            mc.cores=mc.cores,
>>>                            mc.cleanup=mc.cleanup))
>>>    if (length(res) != n)
>>>      warning("some results may be missing, folded or caused an error")
>>>    res
>>> }
>>> pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
>>>                    mc.cores = getOption("mc.cores", 2L), mc.cleanup =
>>> TRUE,
>>>                    mc.preschedule=FALSE, num.chunks, chunk.size)
>>> {
>>>    env <- parent.frame()
>>>    cores <- as.integer(mc.cores)
>>>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>    if(cores == 1L) return(FUN(v, ...))
>>>
>>>    if(mc.set.seed) mc.reset.stream()
>>>
>>>    n <- length(v)
>>>    if (missing(num.chunks)) {
>>>      if (missing(chunk.size)) {
>>>        num.chunks <- cores
>>>      } else {
>>>        num.chunks <- ceiling(n/chunk.size)
>>>      }
>>>    }
>>>    si <- splitIndices(n, num.chunks)
>>>    res <- do.call(c,
>>>                   mclapply(si, function(i) FUN(v[i], ...),
>>>                            mc.set.seed=mc.set.seed,
>>>                            mc.silent=mc.silent,
>>>                            mc.cores=mc.cores,
>>>                            mc.cleanup=mc.cleanup,
>>>                            mc.preschedule=mc.preschedule))
>>>    if (length(res) != n)
>>>      warning("some results may be missing, folded or caused an error")
>>>    res
>>> }
>>>
>>> pvectorize <- function(FUN) {
>>>    function(...) pvec3(FUN=FUN, ...)
>>> }
>>
>>
>> would be great to have these as 'pull' requests in github; pvec3 as a
>> replacement for pvec, if it's implementing the same concept but better.
>>
>> Unit tests would be good (yes being a little hypocritical).
>> inst/unitTests, using RUnit, examples in
>>
>>
>> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/IRanges/inst/unitTests
>>
>>
>> with username / password readonly
>>
>> Martin
>>
>>> On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote:
>>>>
>>>>
>>>> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan <mtmorgan at fhcrc.org>
>>>> wrote:
>>>>
>>>>>
>>>>> Interested developers -- I added the start of a BiocParallel
>>>>> package to
>>>>> the Bioconductor subversion repository and build system.
>>>>>
>>>>> The package is mirrored on github to allow for social coding; I
>>>>> encourage
>>>>> people to contribute via that mechanism.
>>>>>
>>>>>
>>>>> https://github.com/**Bioconductor/BiocParallel<https://github.com/Bioconductor/BiocParallel>
>>>>>
>>>>>
>>>>>
>>>>> The purpose is to help focus our efforts at developing appropriate
>>>>> parallel paradigms. Currently the package Imports: parallel and
>>>>> implements
>>>>> pvec and mclapply in a way that allows for operation on any vector
>>>>> or list
>>>>> supporting length(), [, and [[ (the latter for mclapply). pvec in
>>>>> particular seems to be appropriate for GRanges-like objects, where
>>>>> we don't
>>>>> necessarily want to extract many thousands of S4 instances of
>>>>> individual
>>>>> ranges with [[.
>>>>>
>>>>
>>>>
>>>> Makes sense. Besides, [[ does not even work on GRanges. One
>>>> limitation of
>>>> pvec is that it does not support a chunk size; it just uses length(x) /
>>>> ncores. It would be nice to be able to restrict that, which would then
>>>> require multiple jobs per core. Unless I'm missing something.
>>>>
>>>>
>>>>>
>>>>>
>>>>> Hopefully the ideas in the package can be folded back in to
>>>>> parallel as
>>>>> they mature.
>>>>>
>>>>> Martin
>>>>> --
>>>>> Dr. Martin Morgan, PhD
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> ______________________________**_________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>>
>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list