[Bioc-devel] BiocParallel -- update

Ryan C. Thompson rct at thompsonclan.org
Tue Dec 4 21:47:54 CET 2012


One issue that I see is that for some kinds of parallel backends, there 
may not be any way for "bpworkers" to return something meaningful. For 
example, a backend that submits jobs to a large cluster may not know 
exactly how many nodes are in the cluster, and in any case returning the 
total number of nodes may not be appropriate, since those nodes are 
shared with other cluster users. This is primarily important for the 
pvec function, which uses the result of bpworkers to decide how many 
chunks to split the input into.

I guess one solution is to make sure that for any backend that cannot 
natively determine a number of available workers, we require the number 
of workers as an argument when creating the param object for that 
backend. e.g.:

param <- IndeterminateSizedClusterParam(workers=50).

Additionally, as discussed previously, it makes sense to be able to 
explicitly choose a chunk size or number of chunks for pvec, rather than 
splitting into exactly as many chunks as there are parallel workers. I 
implemented this in the non-generic multicore-only version of pvec, but 
I still need to port it to the generic version that works for any param. 
Do people think that the chunk options should be included in the 
MulticoreParam class, or specified when pvec is called?

I have also written a non-generic multicore-only version of pvectorize 
that allows for multiple vectorized arguments instead of just one, and 
furthermore gives the parallelized function an identical signature to 
the original function. Again, this needs to be ported to the generic 
bpvectorize.



More information about the Bioc-devel mailing list