[BioC] Fwd: summarizeOverlaps behavior on RNA-Seq paired end stranded data

Valerie Obenchain vobencha at fhcrc.org
Thu Oct 3 17:27:23 CEST 2013


On 10/02/2013 04:31 PM, Abhishek Pratap wrote:
> Hi
>
>
> On Wed, Oct 2, 2013 at 10:28 AM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     Hi,
>
>
>     On 10/02/2013 12:35 AM, Abhishek Pratap wrote:
>
>         Hi All
>
>         Just wanted to check on the expected behavior of the
>         summarizeOverlaps
>         function in the GenomicRanges package for the following cases.
>
>
>         1. In the case of paired end data does it count each read pair
>         as 1 or 2
>         against a feature.
>
>
>     Pairs are counted as a single 'hit' reguardless if one or both mates
>     overlap the feature.
>
>
>
>         2. For stranded protocols of RNA-Seq does it take it account the
>         opposite
>         strand of the mate of a read when counting or does it take only
>         one read
>         matching the strand into consideration.
>
>
>     When counting, pairs are treated as though they are from the same
>     strand. In general, you can ignore the strand when counting by
>     setting the 'ignore.strand=TRUE'.
>
>
>
>         3. For second strand RNA-Seq protocol where the read-1 matches
>         to the
>         opposite strand of gene will summarizeOverlap work ?
>
>
>     I'm not sure what you mean. Please provide an example.
>
>
>     summarizeOverlaps() reads the data from a BAM into a GAlignmentPairs
>     or GAlignmentsList container for counting. You can investigate the
>     behavior using a small test case.
>
>     ## paired-end record
>     ga1 <- GAlignments("chr1", 1L, "10M", strand("+"))
>     ga2 <- GAlignments("chr1", 15L, "11M", strand("-"))
>     galp <- GAlignmentPairs(ga1, ga2, TRUE)
>
>     ## annotation
>     ann <- GRanges("chr1", IRanges(c(1, 5, 12, 20), c(25, 20, 14, 30)), "-")
>
>     ## all with mode="Union"
>     se1 <- summarizeOverlaps(ann[1], galp, ignore.strand=TRUE)
>      > assays(se1)$counts
>           reads
>     [1,]     1
>     se2 <- summarizeOverlaps(ann[1], galp, ignore.strand=FALSE)
>      > assays(se2)$counts
>           reads
>     [1,]     0
>     se3 <- summarizeOverlaps(ann[4], galp, ignore.strand=TRUE)
>      > assays(se3)$counts
>           reads
>     [1,]     1
>
>
>     Valerie
>
>
> I think thats a good idea to test it on a artificial set. Just a quick
> question. I cant seem to use the constructor GAlingments(). Is that
> something to do with the version ?
>
>
> library("GenomicRanges")
> library("Rsamtools")
> GAlignments("chr1",1L,"10M",strand("+"))
>  >Error: could not find function "GAlignments"

The class is called GappedAlignments in release and GAlignments in devel 
(name changed this cycle). For an example of constructing one, see the 
summarizeOverlaps man page.

?summarizeOverlaps


Valerie


>
>
> Partial output from sessionInfo()
> other attached packages:
> [1] Rsamtools_1.12.4     Biostrings_2.28.0    GenomicRanges_1.12.5
> IRanges_1.18.4       BiocGenerics_0.6.0
>
>
> Thanks!
> -Abhi
>
>
>
>         Thanks!
>         -Abhi
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>



More information about the Bioconductor mailing list