[BioC] countOverlaps() only count positive strand (from GenomicRanges example)
Martin Morgan
mtmorgan at fhcrc.org
Mon Dec 13 20:19:38 CET 2010
On 12/13/2010 10:09 AM, Song Li wrote:
> Hi Martin and Vincent,
>
> Thank you for your replies, here is my code:
>
>> strand(exonRanges)<-"*"
> Error in `strand<-`(`*tmp*`, value = "*") :
> replacement 'value' is not an AtomicList with the same elementLengths as 'x'
create a list of character vectors, each of the appropriate length
value <- lapply(elementLengths(exonRanges), rep, x="*")
convert this list to a class derived from 'AtomicList', and do the
replacement
strand(exonRanges) <- CharacterList(value)
Sometimes it's easier / faster to work with exons(txdb, column="tx_id")
or similar.
Martin
>
>> levels(strand(aligns))<-c('*','*','*')
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "strand<-", for
> signature "GappedAlignments"
>
> It does not seem to be that straightforward.
>
> I have been searching for ways to make modification to the
> GappedAlignments and GRangesList objects.
>
> Song
>
> On Mon, Dec 13, 2010 at 12:39 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 12/13/2010 09:30 AM, Vincent Carey wrote:
>>> On Mon, Dec 13, 2010 at 12:06 PM, Song Li <songli116 at gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> I was following the example in the "GenomicRanges Use Cases", section
>>>> 3, "Simple RNA-seq Analysis", by "copy-paste" command from the
>>>> document from page 7 to page 8:
>>>>
>>>> What I found out is that countOverlap() only count reads aligned to
>>>> positive strand. There are 28 reads map to region GRList[6645], 13 on
>>>> positive strand, 15 on negative strand. The "count[6645]" is 13.
>>>>
>>>> Is there a way to count both strand?
>>>
>>> If you want to disregard strand of exon range, set it to "*". This
>>> does not appear to be a one-liner.
>>
>> it's tricky to set strand() on a GappedAlignment (because the CIGAR has
>> to be adjusted appropriately) but the I believe that the converse
>>
>> strand(exonRanges) <- "*"
>>
>> accomplishes the same goal -- reads are counted without regard to strand
>> of alignment.
>>
>> Martin
>>
>>>
>>>>
>>>> Thanks,
>>>> Song Li
>>>>
>>>> Here are all the code:
>>>> #-----> Start with code from the package description<-------#
>>>>> library(Rsamtools)
>>>>> testFile <- system.file("bam", "isowt5_13e.bam", package = "leeBamViews")
>>>>> aligns <- readBamGappedAlignments(testFile)
>>>>> library(GenomicFeatures)
>>>>> txdb <- makeTranscriptDbFromUCSC(genome = "sacCer2", tablename = "sgdGene")
>>>>> exonRanges <- exonsBy(txdb, "tx")
>>>>> length(exonRanges)
>>>>> levels(rname(aligns)) <- c(paste("chr", as.roman(1:16),
>>>> + sep = ""), "chrM")
>>>>> counts <- countOverlaps(exonRanges, aligns)
>>>>
>>>> #-----> find alignments that map to GRList$6645<-------#
>>>>> aligns[705:732]
>>>> GappedAlignments of length 28
>>>> rname strand cigar qwidth start end width ngap
>>>> [1] chrXIII + 36M 36 804443 804478 36 0
>>>> [2] chrXIII + 36M 36 804466 804501 36 0
>>>> [3] chrXIII - 36M 36 804473 804508 36 0
>>>> [4] chrXIII + 36M 36 804476 804511 36 0
>>>> [5] chrXIII - 36M 36 804493 804528 36 0
>>>> [6] chrXIII + 36M 36 804516 804551 36 0
>>>> [7] chrXIII + 36M 36 804562 804597 36 0
>>>> [8] chrXIII + 36M 36 804562 804597 36 0
>>>> [9] chrXIII - 36M 36 804574 804609 36 0
>>>> ... ... ... ... ... ... ... ... ...
>>>> [20] chrXIII + 36M 36 804764 804799 36 0
>>>> [21] chrXIII - 36M 36 804798 804833 36 0
>>>> [22] chrXIII + 36M 36 804799 804834 36 0
>>>> [23] chrXIII + 36M 36 804799 804834 36 0
>>>> [24] chrXIII - 36M 36 804816 804851 36 0
>>>> [25] chrXIII - 36M 36 804947 804982 36 0
>>>> [26] chrXIII - 36M 36 804953 804988 36 0
>>>> [27] chrXIII - 36M 36 804955 804990 36 0
>>>> [28] chrXIII - 36M 36 804974 805009 36 0
>>>>> exonRanges[6646]
>>>> GRangesList of length 1
>>>> $6646
>>>> GRanges with 1 range and 3 elementMetadata values
>>>> seqnames ranges strand | exon_id exon_name exon_rank
>>>> <Rle> <IRanges> <Rle> | <integer> <character> <integer>
>>>> [1] chrXIII [804455, 805090] + | 7011 NA 1
>>>>
>>>>
>>>> seqlengths
>>>> chrIV chrXV chrVII chrXII chrXVI ... chrVI chrI chrM 2micron
>>>> 1531919 1091289 1090947 1078175 948062 ... 270148 230208 85779 6318
>>>>> counts[6646]
>>>> [1] 13
>>>>> table(strand(aligns[705:732]))
>>>>
>>>> + -
>>>> 13 15
>>>>
>>>>
>>>>> sessionInfo()
>>>> R version 2.12.0 (2010-10-15)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] Rsamtools_1.2.1 Biostrings_2.18.0 GenomicFeatures_1.2.3
>>>> [4] GenomicRanges_1.2.1 IRanges_1.8.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.10.0 biomaRt_2.6.0 BSgenome_1.18.1 DBI_0.2-5
>>>> [5] RCurl_1.4-3 RSQLite_0.9-3 rtracklayer_1.10.5 tools_2.12.0
>>>> [9] XML_3.2-0
>>>>
>>>> Song Li
>>>> --
>>>> Postdoctoral Associate
>>>> Institute for Genome Sciences and Policy
>>>> Duke University
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list