[BioC] summarizeOverlaps, inter.feature=FALSE and mode="IntersectionNotEmpty"

Valerie Obenchain vobencha at fhcrc.org
Wed May 15 17:47:46 CEST 2013


I agree Mike. Following this same logic, row 7 for IntersectionNotEmpty 
should be 0,0 instead of 1,1.

Thomas, Martin, would you agree?

Valerie


On 05/15/2013 04:47 AM, Michael Love wrote:
> hi,
>
> The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look
> very useful.
>
> I'm wondering about the behavior for inter.feature=FALSE and
> mode="IntersectionNotEmpty".
>
> Earlier on the list, from Martin Morgan, there was the proposed behavior:
>
> You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes
> 'counted twice'
>
> |*----------------------+-----+-----------+-----------+---------------|*|*
> Mode                 | Row | Feature 1 | Feature 2 | Hits per read
> |*|*----------------------+-----+-----------+-----------+---------------|*|*
> Union                |   5 |         1 |         0 |             1
> |*|*                      |   6 |         1 |         1 |
> 2 |*|*                      |   7 |         1 |         1 |
>   2 |*|* IntersectionStrict   |   5 |         1 |         0 |
>    1 |*|*                      |   6 |         1 |         0 |
>     1 |*|*                      |   7 |         1 |         1 |
>      2 |*|* IntersectionNotEmpty |   5 |         1 |         0 |
>       1 |*|*                      |   6 |         1 |         1 |
>        2 |*|*                      |   7 |         1 |         1 |
>         2 |*|*----------------------+-----+-----------+-----------+---------------|*
>
> ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html )
>
> For row 6 of this diagram (
> http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps-modes.pdf)
> and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit.
>
> The man page has:
>
> IntersectionNotEmpty : For this counting mode, the features are partitioned
> into unique disjoint regions. This is accomplished by disjoining the
> feature ranges then removing ranges shared by more than one feature. The
> result is a group of non-overlapping regions each of which belong to a
> single feature.
> Simple and gapped reads are counted if,
> 1. the read or exactly 1 of the read fragments overlaps a unique disjoint
> region
> 2. the read or >1 read fragments overlap >1 unique disjoint region from
> the same feature
>
> An example, where the read falls completely within Feature 1 but partially
> in Feature 2, the behavior is consistent with the table above:
>
>> reads <-
> GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+"))
>> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L)))
>> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty",
> inter.feature=FALSE))
>       [,1]
> [1,]    1
> [2,]    1
>
>> disjoin(features)
> GRanges with 3 ranges and 0 metadata columns:
>        seqnames       ranges strand
>           <Rle>    <IRanges>  <Rle>
>    [1]     chr1 [1300, 1699]      *
>    [2]     chr1 [1700, 2000]      *
>    [3]     chr1 [2001, 2400]      *
>    ---
>    seqlengths:
>     chr1
>       NA
>
>> findOverlaps(reads, disjoin(features))
> Hits of length 2
> queryLength: 1
> subjectLength: 3
>    queryHits subjectHits
>     <integer>   <integer>
>   1         1           1
>   2         1           2
>
> So the example read overlaps the unique region of Feature 1 and the shared
> region, which supposedly has been removed. So I would expect it to only
> count to Feature 1.
>
>> sessionInfo()
> R Under development (unstable) (2013-05-14 r62742)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] GenomicRanges_1.13.11 XVector_0.0.0         IRanges_1.19.7
> [4] BiocGenerics_0.7.2    BiocInstaller_1.11.1  Defaults_1.1-1
>
> loaded via a namespace (and not attached):
> [1] stats4_3.1.0
>
>
> thanks,
>
> Mike
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list