[Bioc-devel] BiocParallel

Ryan C. Thompson rct at thompsonclan.org
Thu Nov 15 01:14:47 CET 2012


I just submitted a pull request. I'll add tests shortly if I can figure 
out how to write them.

On Wed 14 Nov 2012 03:50:36 PM PST, Martin Morgan wrote:
> On 11/14/2012 03:43 PM, Ryan C. Thompson wrote:
>> Here are two alternative implementations of pvec. pvec2 is just a
>> simple rewrite
>> of pvec to use mclapply. pvec3 then extends pvec2 to accept a
>> specified chunk
>> size or a specified number of chunks. If the number of chunks exceeds
>> the number
>> of cores, then multiple chunks will get run sequentially on each
>> core. pvec3
>> also exposes the "mc.prescheule" argument of mclapply, since that is
>> relevant
>> when there are more chunks than cores. Lastly, I provide a
>> "pvectorize" function
>> which can be called on a regular vectorized function to make it into
>> a pvec'd
>> version of itself. Usage is like: sqrt.parallel <- pvectorize(sqrt);
>> sqrt.parallel(1:1000).
>>
>> pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
>>                    mc.cores = getOption("mc.cores", 2L), mc.cleanup =
>> TRUE)
>> {
>>    env <- parent.frame()
>>    cores <- as.integer(mc.cores)
>>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>>    if(cores == 1L) return(FUN(v, ...))
>>
>>    if(mc.set.seed) mc.reset.stream()
>>
>>    n <- length(v)
>>    si <- splitIndices(n, cores)
>>    res <- do.call(c,
>>                   mclapply(si, function(i) FUN(v[i], ...),
>>                            mc.set.seed=mc.set.seed,
>>                            mc.silent=mc.silent,
>>                            mc.cores=mc.cores,
>>                            mc.cleanup=mc.cleanup))
>>    if (length(res) != n)
>>      warning("some results may be missing, folded or caused an error")
>>    res
>> }
>> pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
>>                    mc.cores = getOption("mc.cores", 2L), mc.cleanup =
>> TRUE,
>>                    mc.preschedule=FALSE, num.chunks, chunk.size)
>> {
>>    env <- parent.frame()
>>    cores <- as.integer(mc.cores)
>>    if(cores < 1L) stop("'mc.cores' must be >= 1")
>>    if(cores == 1L) return(FUN(v, ...))
>>
>>    if(mc.set.seed) mc.reset.stream()
>>
>>    n <- length(v)
>>    if (missing(num.chunks)) {
>>      if (missing(chunk.size)) {
>>        num.chunks <- cores
>>      } else {
>>        num.chunks <- ceiling(n/chunk.size)
>>      }
>>    }
>>    si <- splitIndices(n, num.chunks)
>>    res <- do.call(c,
>>                   mclapply(si, function(i) FUN(v[i], ...),
>>                            mc.set.seed=mc.set.seed,
>>                            mc.silent=mc.silent,
>>                            mc.cores=mc.cores,
>>                            mc.cleanup=mc.cleanup,
>>                            mc.preschedule=mc.preschedule))
>>    if (length(res) != n)
>>      warning("some results may be missing, folded or caused an error")
>>    res
>> }
>>
>> pvectorize <- function(FUN) {
>>    function(...) pvec3(FUN=FUN, ...)
>> }
>
> would be great to have these as 'pull' requests in github; pvec3 as a
> replacement for pvec, if it's implementing the same concept but better.
>
> Unit tests would be good (yes being a little hypocritical).
> inst/unitTests, using RUnit, examples in
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/IRanges/inst/unitTests
>
>
> with username / password readonly
>
> Martin
>
>> On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote:
>>>
>>> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan <mtmorgan at fhcrc.org>
>>> wrote:
>>>
>>>>
>>>> Interested developers -- I added the start of a BiocParallel
>>>> package to
>>>> the Bioconductor subversion repository and build system.
>>>>
>>>> The package is mirrored on github to allow for social coding; I
>>>> encourage
>>>> people to contribute via that mechanism.
>>>>
>>>> https://github.com/**Bioconductor/BiocParallel<https://github.com/Bioconductor/BiocParallel>
>>>>
>>>>
>>>>
>>>> The purpose is to help focus our efforts at developing appropriate
>>>> parallel paradigms. Currently the package Imports: parallel and
>>>> implements
>>>> pvec and mclapply in a way that allows for operation on any vector
>>>> or list
>>>> supporting length(), [, and [[ (the latter for mclapply). pvec in
>>>> particular seems to be appropriate for GRanges-like objects, where
>>>> we don't
>>>> necessarily want to extract many thousands of S4 instances of
>>>> individual
>>>> ranges with [[.
>>>>
>>>
>>>
>>> Makes sense. Besides, [[ does not even work on GRanges. One
>>> limitation of
>>> pvec is that it does not support a chunk size; it just uses length(x) /
>>> ncores. It would be nice to be able to restrict that, which would then
>>> require multiple jobs per core. Unless I'm missing something.
>>>
>>>
>>>>
>>>>
>>>> Hopefully the ideas in the package can be folded back in to
>>>> parallel as
>>>> they mature.
>>>>
>>>> Martin
>>>> --
>>>> Dr. Martin Morgan, PhD
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> ______________________________**_________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>
>>>>
>>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>



More information about the Bioc-devel mailing list