[BioC] Amplicon and exon level read counts and GC content

Wed May 30 21:24:59 CEST 2012

I see. Thanks again for your clarifications!

Best,
Yu Chuan

On Tue, 29 May 2012, Martin Morgan wrote:

> On 05/29/2012 11:12 PM, Yu Chuan Tai wrote:
>> Hi Martin,
>> 
>> Thanks for your quick reply and great help! I will try what you
>> suggested below and let you know if I still have any question. A quick
>> question. Can RSamTools convert BAM to SAM?
>
> No; it will go the other way around (SAM --> BAM, see ?asBam) which is 
> generally the right thing to do (smaller files, more easily 'queried' for 
> regions of interest, etc, especially after indexing ?indexBam). Martin
>
>> 
>> Best,
>> Yu Chuan
>> 
>> On Tue, 29 May 2012, Martin Morgan wrote:
>> 
>>> On 05/29/2012 10:44 PM, Martin Morgan wrote:
>>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>>>>> Hi,
>>>>> 
>>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>>>>> sequencing, is there any package/function which can extract
>>>>> amplicon-level read counts and GC content from an aligned file of BAM
>>>>> format? The same question to exon-level read counts and GC content.
>>>>> More
>>>>> generally, given a genomic interval, how could I extract the read count
>>>>> and GC content for that interval?
>>>> 
>>>> The Rsamtools package has scanBam for read input, also
>>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
>>>> for higher-level read counting. The requirement is generally a
>>>> 'GRanges', which can be queried from TxDb packages (e.g.,
>>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
>>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
>>>> rtracklayer, and Rsamtools packages; see
>>>> 
>>>> http://bioconductor.org/packages/release/BiocViews.html#___Software
>>> 
>>> More specifically, after
>>> 
>>> library(Rsamtools)
>>> example(scanBam) # defines 'fl', a path to a bam file
>>> 
>>> for a _single_ genomic range
>>> 
>>> param = ScanBamParam(what="seq",
>>> which=GRanges("seq1", IRanges(100, 500)))
>>> dna = scanBam(fl, param=param)[[1]][["seq"]]
>>> length(dna) # 365 reads overlap region
>>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>>> 
>>> though you'd likely want to specify several regions (vector arguments
>>> to GRanges) and think about flags (scanBamFlag() and the flag argument
>>> to ScanBamParam), read mapping quality, reads overlapping more than
>>> one region, etc. (summarizeOverlaps implements several counting
>>> strategies, but it is 'easy' to implement arbitrary approaches).
>>> 
>>>> 
>>>> Martin
>>>> 
>>>>> 
>>>>> Thanks for any input!
>>>>> 
>>>>> Best,
>>>>> Yu Chuan
>>>>> 
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>> 
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>> 
>
>
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>