[Bioc-devel] BiocParallel -- update
Ryan C. Thompson
rct at thompsonclan.org
Tue Dec 4 21:47:54 CET 2012
One issue that I see is that for some kinds of parallel backends, there
may not be any way for "bpworkers" to return something meaningful. For
example, a backend that submits jobs to a large cluster may not know
exactly how many nodes are in the cluster, and in any case returning the
total number of nodes may not be appropriate, since those nodes are
shared with other cluster users. This is primarily important for the
pvec function, which uses the result of bpworkers to decide how many
chunks to split the input into.
I guess one solution is to make sure that for any backend that cannot
natively determine a number of available workers, we require the number
of workers as an argument when creating the param object for that
backend. e.g.:
param <- IndeterminateSizedClusterParam(workers=50).
Additionally, as discussed previously, it makes sense to be able to
explicitly choose a chunk size or number of chunks for pvec, rather than
splitting into exactly as many chunks as there are parallel workers. I
implemented this in the non-generic multicore-only version of pvec, but
I still need to port it to the generic version that works for any param.
Do people think that the chunk options should be included in the
MulticoreParam class, or specified when pvec is called?
I have also written a non-generic multicore-only version of pvectorize
that allows for multiple vectorized arguments instead of just one, and
furthermore gives the parallelized function an identical signature to
the original function. Again, this needs to be ported to the generic
bpvectorize.
More information about the Bioc-devel
mailing list