[BioC] Amplicon and exon level read counts and GC content

Yu Chuan Tai yuchuan at stat.berkeley.edu
Wed May 30 08:12:00 CEST 2012


Hi Martin,

Thanks for your quick reply and great help! I will try what you suggested 
below and let you know if I still have any question. A quick question. Can 
RSamTools convert BAM to SAM?

Best,
Yu Chuan

On Tue, 29 May 2012, Martin Morgan wrote:

> On 05/29/2012 10:44 PM, Martin Morgan wrote:
>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>>> Hi,
>>> 
>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>>> sequencing, is there any package/function which can extract
>>> amplicon-level read counts and GC content from an aligned file of BAM
>>> format? The same question to exon-level read counts and GC content. More
>>> generally, given a genomic interval, how could I extract the read count
>>> and GC content for that interval?
>> 
>> The Rsamtools package has scanBam for read input, also
>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
>> for higher-level read counting. The requirement is generally a
>> 'GRanges', which can be queried from TxDb packages (e.g.,
>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
>> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
>> rtracklayer, and Rsamtools packages; see
>> 
>> http://bioconductor.org/packages/release/BiocViews.html#___Software
>
> More specifically, after
>
>  library(Rsamtools)
>  example(scanBam)  # defines 'fl', a path to a bam file
>
> for a _single_ genomic range
>
>  param = ScanBamParam(what="seq",
>                       which=GRanges("seq1", IRanges(100, 500)))
>  dna = scanBam(fl, param=param)[[1]][["seq"]]
>  length(dna)  # 365 reads overlap region
>  alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>
> though you'd likely want to specify several regions (vector arguments to 
> GRanges) and think about flags (scanBamFlag() and the flag argument to 
> ScanBamParam), read mapping quality, reads overlapping more than one region, 
> etc. (summarizeOverlaps implements several counting strategies, but it is 
> 'easy' to implement arbitrary approaches).
>
>> 
>> Martin
>> 
>>> 
>>> Thanks for any input!
>>> 
>>> Best,
>>> Yu Chuan
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
>
>
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>



More information about the Bioconductor mailing list