[Bioc-devel] [BioC] readGappedAlignments and param argument
Martin Morgan
mtmorgan at fhcrc.org
Tue Sep 10 12:25:33 CEST 2013
On 09/10/2013 12:22 AM, Elena Grassi wrote:
> Ok, thanks - I misread the manual, I just thought that was a way to
> filter out reads falling over certain regions.
> After reading the alignments I use summarizeOverlaps to obtain counts
> with the desired options. I decided to use scanBamParam(which=..)
You can use summarizeOverlaps with a 'BamFileList' created by something like
myFiles = dir("/some/dir", pattern="bam$")
bfl = BamFileList(myFiles, yieldSize=1000000)
olaps = summarizeOverlaps(features, bfl)
see the example on the help page
method?"summarizeOverlaps,GRanges,BamFileList"
(there's helpful tab completion after 'method?"sum<tab>'). If you've also said
library(parallel)
options(mc.cores=detectCores()) ## how many cores to use?
then the bam files are processed in parallel (choose yieldSize so that the
different threads are not competing for memory).
> because loading bams and counting overlaps is using a lot of RAM for
> big bams and I found online various suggestions to use the
more generally a strategy is to use yieldSize and iterate through the file, like
bf = BamFile(fl, yieldsSize=1000000)
open(bf)
repeat {
curr = readBamGappedAlignments(bf) ## read up to yieldSize records
if (length(curr) == 0)
break ## done
## process chunk 'curr'
}
close(bf)
this is actually what summarizeOverlaps does when invoked on a BamFile (or
BamFileList().
> ScanBamParam to load one chr at a time...but as long as I would like
> to be general I decided to extract all the seqlevels in the GRanges
> object that I'm using and then I would like to iterate over those
> extracting the reads that fall on different seqlevels.
> I noticed that the counts where different with this approach with
> respect to doing all the work together. I guess that I have to find
> another way to split my alignments...but I'd really like to avoid
> something like:
> http://comments.gmane.org/gmane.comp.lang.r.sequencing/755
one small thing that came out of that thread was that
as(seqinfo(BamFile("/some/file")), "GRanges")
gives a GRanges of all the sequence lengths.
> Reducing my features would not work as long as I need to keep the
> original number of the reads...I guess I will have to "flatten" all
> the regions falling on a single chr of my Granges in a way that avoid
> overlapping reads, that should do the trick (I hope...)...maybe I
> could extract the smaller-bigger coords for a given chr and obtain
> chr-based Granges from there.
>
> Thank you very much for your help,
> E.
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list