[Bioc-devel] Re: [Rd] ShortRead::FastqStreamer and parallelization

Martin Morgan mtmorgan at fredhutch.org
Tue Nov 18 19:52:41 CET 2014


Re-directed from R-devel, where I guess it went by accident.


On 11/18/2014 09:00 AM, Cook, Malcolm wrote:
> Hi,
>
> I understand ShortRead::FastqStreamer will read chunks in parallel depending
> on the value of ShortRead:::.set_omp_threads
>
> I see this discussed here:
> https://stat.ethz.ch/pipermail/bioc-devel/2013-May/004355.html and nowhere
> else.
>
> It probably should be documented in ShortRead.

yes, it's now documented on the FastqStreamer / Sampler and trim* pages.

>
> Possibly this has already changed for I am using still R 3.1.0.   I thought
> I'd check.
>
> Oh, and, in my hands/hardware, the value of this FastqStreamer's use of
> srapply's parallelization is negligible, at least if the consumer of
> successive yields is in the main process.  I see that the new bpiterate
> appears to take advantage of yielding in forked processes, which sounds
> promising.  Is that the idea?

Yes, individual instances of FastqStreamer (and Sampler) don't benefit from 
R-level parallel evaluation; they both are 'readers' that iterate sequentially 
through the entire file. If you were streaming or sampling from several files 
(as when creating a qa report, where FastqSampler is used 'under the hood'), the 
srapply (or nowadays just BiocParallel::bplapply would distribute the streaming 
/ sampling of each file to a separate process. This would be an effective way of 
managing memory while performing parallel evaluation.

bpiterate could be used effectively with FastqStreamer, if the operation done 
with the chunk of the file were somehow expensive; when processing several files 
it is probably more scalable to parallelize over files, using FastqStreamer to 
manage memory.

Martin

>
> Looking forward....
>
> Malcolm Cook
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list