[Bioc-devel] BiocParallel

Ryan C. Thompson rct at thompsonclan.org
Thu Nov 15 00:43:13 CET 2012


Here are two alternative implementations of pvec. pvec2 is just a simple 
rewrite of pvec to use mclapply. pvec3 then extends pvec2 to accept a 
specified chunk size or a specified number of chunks. If the number of 
chunks exceeds the number of cores, then multiple chunks will get run 
sequentially on each core. pvec3 also exposes the "mc.prescheule" 
argument of mclapply, since that is relevant when there are more chunks 
than cores. Lastly, I provide a "pvectorize" function which can be 
called on a regular vectorized function to make it into a pvec'd version 
of itself. Usage is like: sqrt.parallel <- pvectorize(sqrt); 
sqrt.parallel(1:1000).

pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
                   mc.cores = getOption("mc.cores", 2L), mc.cleanup = TRUE)
{
   env <- parent.frame()
   cores <- as.integer(mc.cores)
   if(cores < 1L) stop("'mc.cores' must be >= 1")
   if(cores == 1L) return(FUN(v, ...))

   if(mc.set.seed) mc.reset.stream()

   n <- length(v)
   si <- splitIndices(n, cores)
   res <- do.call(c,
                  mclapply(si, function(i) FUN(v[i], ...),
                           mc.set.seed=mc.set.seed,
                           mc.silent=mc.silent,
                           mc.cores=mc.cores,
                           mc.cleanup=mc.cleanup))
   if (length(res) != n)
     warning("some results may be missing, folded or caused an error")
   res
}

pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE,
                   mc.cores = getOption("mc.cores", 2L), mc.cleanup = TRUE,
                   mc.preschedule=FALSE, num.chunks, chunk.size)
{
   env <- parent.frame()
   cores <- as.integer(mc.cores)
   if(cores < 1L) stop("'mc.cores' must be >= 1")
   if(cores == 1L) return(FUN(v, ...))

   if(mc.set.seed) mc.reset.stream()

   n <- length(v)
   if (missing(num.chunks)) {
     if (missing(chunk.size)) {
       num.chunks <- cores
     } else {
       num.chunks <- ceiling(n/chunk.size)
     }
   }
   si <- splitIndices(n, num.chunks)
   res <- do.call(c,
                  mclapply(si, function(i) FUN(v[i], ...),
                           mc.set.seed=mc.set.seed,
                           mc.silent=mc.silent,
                           mc.cores=mc.cores,
                           mc.cleanup=mc.cleanup,
                           mc.preschedule=mc.preschedule))
   if (length(res) != n)
     warning("some results may be missing, folded or caused an error")
   res
}

pvectorize <- function(FUN) {
   function(...) pvec3(FUN=FUN, ...)
}

On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote:
>
> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan <mtmorgan at fhcrc.org> 
> wrote:
>
>>
>> Interested developers -- I added the start of a BiocParallel package to
>> the Bioconductor subversion repository and build system.
>>
>> The package is mirrored on github to allow for social coding; I encourage
>> people to contribute via that mechanism.
>>
>> https://github.com/**Bioconductor/BiocParallel<https://github.com/Bioconductor/BiocParallel>
>>
>> The purpose is to help focus our efforts at developing appropriate
>> parallel paradigms. Currently the package Imports: parallel and 
>> implements
>> pvec and mclapply in a way that allows for operation on any vector or 
>> list
>> supporting length(), [, and [[ (the latter for mclapply). pvec in
>> particular seems to be appropriate for GRanges-like objects, where we 
>> don't
>> necessarily want to extract many thousands of S4 instances of individual
>> ranges with [[.
>>
>
>
> Makes sense. Besides, [[ does not even work on GRanges. One limitation of
> pvec is that it does not support a chunk size; it just uses length(x) /
> ncores. It would be nice to be able to restrict that, which would then
> require multiple jobs per core. Unless I'm missing something.
>
>
>>
>>
>> Hopefully the ideas in the package can be folded back in to parallel as
>> they mature.
>>
>> Martin
>> --
>> Dr. Martin Morgan, PhD
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> ______________________________**_________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list