[BioC] summarizeOverlaps mode ignoring inter feature overlaps
Valerie Obenchain
vobencha at fhcrc.org
Tue May 14 23:21:44 CEST 2013
Hi Thomas,
Two new args have been added to summarizeOverlaps(), 'inter.feature' and
'fragments'. Available in GenomicRanges 1.13.11 and Rsamtools 1.13.13.
The ?summarizeOverlaps page in GenomicRanges now has all examples (vs
having half in GenomicRanges, half in Rsamtools).
'inter.feature':
When TRUE (default) counting is as it always was - reads that hit
multiple features are resolved with one of the modes or dropped. When
FALSE, each feature that a read hits get a count. This essentially boils
down to countOverlaps() with type="any" (Union and IntersectionNotEmpty)
or type="within" (IntersectionStrict).
'fragments':
This argument is relevant to counting paired-end Bam files. It was added
because of the flexibility the GAlignmentsList class offers. The
familiar GAlignmentPairs class holds reads that have been "properly
mated" with the algorithm in ?findMateAlignment. GAlignmentsList can
hold these "properly mated" reads as well the singletons, reads with
unmapped pairs and any others in the Bam.
When TRUE (default), "properly mated" and others, are counted. You can
of course still add your own filtering / QC with
param = ScanBamParam(). When FALSE, only reads that have been "properly
mated" will be counted.
Let me know how it goes.
Valerie
On 04/08/13 17:52, Thomas Girke wrote:
> Dear Valerie,
>
> Is there currently any way to run summarizeOverlaps in a feature-overlap
> unaware mode, e.g with an ignorefeatureOL=FALSE/TRUE setting? Currently,
> one can switch back to countOverlaps when feature overlap unawareness is
> the more appropriate counting mode for a biological question, but then
> double counting of reads mapping to multiple-range features is not
> accounted for. It would be really nice to have such a feature-overlap
> unaware option directly in summarizeOverlaps.
>
> Another question relates to the memory usage of summarizeOverlaps. Has
> this been optimized yet? On a typical bam file with ~50-100 million
> reads the memory usage of summarizeOverlaps is often around 10-20GB. To
> use the function on a desktop computer or in large-scale RNA-Seq
> projects on a commodity compute cluster, it would be desirable if every
> counting instance would consume not more than 5GB of RAM.
>
> Thanks in advance for your help and suggestions,
>
> Thomas
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list