[BioC] summarizeOverlaps mode ignoring inter feature overlaps
Martin Morgan
mtmorgan at fhcrc.org
Sat Apr 13 01:40:15 CEST 2013
On 04/12/2013 03:57 PM, Wei Shi wrote:
>
> On Apr 13, 2013, at 12:08 AM, Martin Morgan wrote:
>
>> On 04/12/2013 06:53 AM, Michael Lawrence wrote:
>>> Hi Wei,
>>>
>>> summarizeSpliceOverlaps does not yet exist. We held off on that until we
>>> determined whether findSpliceOverlaps was useful. And yes,
>>> findSpliceOverlaps counts reads on a per-feature basis, so it will
>>> double count reads in that way. That's totally intentional (and I'm
>>> pretty sure what Thomas was wanting).
>>
>> also the object returned can be easily manipulated?
>>
>>> olaps = findSpliceOverlaps(galp, genes) olaps
>> Hits of length 2 queryLength: 1 subjectLength: 2 queryHits subjectHits
>> compatible unique coding strandSpecific <integer> <integer>
>> <logical> <logical> <logical> <logical> 1 1 1
>> FALSE FALSE NA TRUE 2 1 2 FALSE
>> FALSE NA TRUE
>>> olaps[mcols(olaps)$unique]
>> Hits of length 0 queryLength: 1 subjectLength: 2
>>
>> and canonically
>>
>> ulaps = olaps[mcols(olaps)$unique] tabulate(subjectHits(ulaps),
>> subjectLength(ulaps))
>>
>> see ?Hits
>>
>> Martin
>
> But I got no fragments assigned to transcripts. Is this what you meant to
> do?
>
>> ulaps = olaps[mcols(olaps)$unique] tabulate(subjectHits(ulaps),
>> subjectLength(ulaps))
> [1] 0 0
I meant only to illustrate that the object is easily manipulated to implement
different counting schemes. You said
> The findSpliceOverlaps function seems to count reads more than once as well.
> I ran the sample code in the help page for this function and found that the
> single fragment was assigned to both transcripts:
and indeed the function returns all 'hits' in the sense of non-zero overlap
between read and gene. You're interested in counting 'no more than once', and
the transcript does not align uniquely (or in a way that is compatible with each
gene model), so yes, it seems reasonable to count 0 hits for each feature.
On the other hand
galp <- GAlignmentPairs(
GAlignments("chr1", 5L, "6M", strand("+")),
GAlignments("chr1", 20L, "6M", strand("-")),
isProperPair=TRUE)
and
> (olaps <- findSpliceOverlaps(galp, genes))
Hits of length 2
queryLength: 1
subjectLength: 2
queryHits subjectHits compatible unique coding strandSpecific
<integer> <integer> <logical> <logical> <logical> <logical>
1 1 1 TRUE TRUE NA TRUE
2 1 2 FALSE FALSE NA TRUE
> (ulaps <- olaps[mcols(olaps)$unique])
Hits of length 1
queryLength: 1
subjectLength: 2
queryHits subjectHits compatible unique coding strandSpecific
<integer> <integer> <logical> <logical> <logical> <logical>
1 1 1 TRUE TRUE NA TRUE
> tabulate(subjectHits(ulaps), subjectLength(ulaps))
[1] 1 0
Martin
>
> Cheers Wei
>
>
> ______________________________________________________________________ The
> information in this email is confidential and intended solely for the
> addressee. You must not disclose, forward, print or use it without the
> permission of the sender.
> ______________________________________________________________________
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list