[BioC] summarizeOverlaps, inter.feature=FALSE and mode="IntersectionNotEmpty"
Valerie Obenchain
vobencha at fhcrc.org
Wed May 15 17:47:46 CEST 2013
I agree Mike. Following this same logic, row 7 for IntersectionNotEmpty
should be 0,0 instead of 1,1.
Thomas, Martin, would you agree?
Valerie
On 05/15/2013 04:47 AM, Michael Love wrote:
> hi,
>
> The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look
> very useful.
>
> I'm wondering about the behavior for inter.feature=FALSE and
> mode="IntersectionNotEmpty".
>
> Earlier on the list, from Martin Morgan, there was the proposed behavior:
>
> You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes
> 'counted twice'
>
> |*----------------------+-----+-----------+-----------+---------------|*|*
> Mode | Row | Feature 1 | Feature 2 | Hits per read
> |*|*----------------------+-----+-----------+-----------+---------------|*|*
> Union | 5 | 1 | 0 | 1
> |*|* | 6 | 1 | 1 |
> 2 |*|* | 7 | 1 | 1 |
> 2 |*|* IntersectionStrict | 5 | 1 | 0 |
> 1 |*|* | 6 | 1 | 0 |
> 1 |*|* | 7 | 1 | 1 |
> 2 |*|* IntersectionNotEmpty | 5 | 1 | 0 |
> 1 |*|* | 6 | 1 | 1 |
> 2 |*|* | 7 | 1 | 1 |
> 2 |*|*----------------------+-----+-----------+-----------+---------------|*
>
> ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html )
>
> For row 6 of this diagram (
> http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps-modes.pdf)
> and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit.
>
> The man page has:
>
> IntersectionNotEmpty : For this counting mode, the features are partitioned
> into unique disjoint regions. This is accomplished by disjoining the
> feature ranges then removing ranges shared by more than one feature. The
> result is a group of non-overlapping regions each of which belong to a
> single feature.
> Simple and gapped reads are counted if,
> 1. the read or exactly 1 of the read fragments overlaps a unique disjoint
> region
> 2. the read or >1 read fragments overlap >1 unique disjoint region from
> the same feature
>
> An example, where the read falls completely within Feature 1 but partially
> in Feature 2, the behavior is consistent with the table above:
>
>> reads <-
> GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+"))
>> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L)))
>> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty",
> inter.feature=FALSE))
> [,1]
> [1,] 1
> [2,] 1
>
>> disjoin(features)
> GRanges with 3 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr1 [1300, 1699] *
> [2] chr1 [1700, 2000] *
> [3] chr1 [2001, 2400] *
> ---
> seqlengths:
> chr1
> NA
>
>> findOverlaps(reads, disjoin(features))
> Hits of length 2
> queryLength: 1
> subjectLength: 3
> queryHits subjectHits
> <integer> <integer>
> 1 1 1
> 2 1 2
>
> So the example read overlaps the unique region of Feature 1 and the shared
> region, which supposedly has been removed. So I would expect it to only
> count to Feature 1.
>
>> sessionInfo()
> R Under development (unstable) (2013-05-14 r62742)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7
> [4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1
>
> loaded via a namespace (and not attached):
> [1] stats4_3.1.0
>
>
> thanks,
>
> Mike
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list