[BioC] Amplicon and exon level read counts and GC content
Yu Chuan Tai
yuchuan at stat.berkeley.edu
Wed May 30 21:24:59 CEST 2012
I see. Thanks again for your clarifications!
Best,
Yu Chuan
On Tue, 29 May 2012, Martin Morgan wrote:
> On 05/29/2012 11:12 PM, Yu Chuan Tai wrote:
>> Hi Martin,
>>
>> Thanks for your quick reply and great help! I will try what you
>> suggested below and let you know if I still have any question. A quick
>> question. Can RSamTools convert BAM to SAM?
>
> No; it will go the other way around (SAM --> BAM, see ?asBam) which is
> generally the right thing to do (smaller files, more easily 'queried' for
> regions of interest, etc, especially after indexing ?indexBam). Martin
>
>>
>> Best,
>> Yu Chuan
>>
>> On Tue, 29 May 2012, Martin Morgan wrote:
>>
>>> On 05/29/2012 10:44 PM, Martin Morgan wrote:
>>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>>>>> Hi,
>>>>>
>>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>>>>> sequencing, is there any package/function which can extract
>>>>> amplicon-level read counts and GC content from an aligned file of BAM
>>>>> format? The same question to exon-level read counts and GC content.
>>>>> More
>>>>> generally, given a genomic interval, how could I extract the read count
>>>>> and GC content for that interval?
>>>>
>>>> The Rsamtools package has scanBam for read input, also
>>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
>>>> for higher-level read counting. The requirement is generally a
>>>> 'GRanges', which can be queried from TxDb packages (e.g.,
>>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
>>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
>>>> rtracklayer, and Rsamtools packages; see
>>>>
>>>> http://bioconductor.org/packages/release/BiocViews.html#___Software
>>>
>>> More specifically, after
>>>
>>> library(Rsamtools)
>>> example(scanBam) # defines 'fl', a path to a bam file
>>>
>>> for a _single_ genomic range
>>>
>>> param = ScanBamParam(what="seq",
>>> which=GRanges("seq1", IRanges(100, 500)))
>>> dna = scanBam(fl, param=param)[[1]][["seq"]]
>>> length(dna) # 365 reads overlap region
>>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>>>
>>> though you'd likely want to specify several regions (vector arguments
>>> to GRanges) and think about flags (scanBamFlag() and the flag argument
>>> to ScanBamParam), read mapping quality, reads overlapping more than
>>> one region, etc. (summarizeOverlaps implements several counting
>>> strategies, but it is 'easy' to implement arbitrary approaches).
>>>
>>>>
>>>> Martin
>>>>
>>>>>
>>>>> Thanks for any input!
>>>>>
>>>>> Best,
>>>>> Yu Chuan
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
More information about the Bioconductor
mailing list