[BioC] Amplicon and exon level read counts and GC content
Yu Chuan Tai
yuchuan at stat.berkeley.edu
Wed May 30 08:12:00 CEST 2012
Hi Martin,
Thanks for your quick reply and great help! I will try what you suggested
below and let you know if I still have any question. A quick question. Can
RSamTools convert BAM to SAM?
Best,
Yu Chuan
On Tue, 29 May 2012, Martin Morgan wrote:
> On 05/29/2012 10:44 PM, Martin Morgan wrote:
>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>>> Hi,
>>>
>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>>> sequencing, is there any package/function which can extract
>>> amplicon-level read counts and GC content from an aligned file of BAM
>>> format? The same question to exon-level read counts and GC content. More
>>> generally, given a genomic interval, how could I extract the read count
>>> and GC content for that interval?
>>
>> The Rsamtools package has scanBam for read input, also
>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
>> for higher-level read counting. The requirement is generally a
>> 'GRanges', which can be queried from TxDb packages (e.g.,
>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
>> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
>> rtracklayer, and Rsamtools packages; see
>>
>> http://bioconductor.org/packages/release/BiocViews.html#___Software
>
> More specifically, after
>
> library(Rsamtools)
> example(scanBam) # defines 'fl', a path to a bam file
>
> for a _single_ genomic range
>
> param = ScanBamParam(what="seq",
> which=GRanges("seq1", IRanges(100, 500)))
> dna = scanBam(fl, param=param)[[1]][["seq"]]
> length(dna) # 365 reads overlap region
> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>
> though you'd likely want to specify several regions (vector arguments to
> GRanges) and think about flags (scanBamFlag() and the flag argument to
> ScanBamParam), read mapping quality, reads overlapping more than one region,
> etc. (summarizeOverlaps implements several counting strategies, but it is
> 'easy' to implement arbitrary approaches).
>
>>
>> Martin
>>
>>>
>>> Thanks for any input!
>>>
>>> Best,
>>> Yu Chuan
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
More information about the Bioconductor
mailing list