[BioC] Amplicon and exon level read counts and GC content

Martin Morgan mtmorgan at fhcrc.org
Wed May 30 07:59:56 CEST 2012


On 05/29/2012 10:44 PM, Martin Morgan wrote:
> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>> Hi,
>>
>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>> sequencing, is there any package/function which can extract
>> amplicon-level read counts and GC content from an aligned file of BAM
>> format? The same question to exon-level read counts and GC content. More
>> generally, given a genomic interval, how could I extract the read count
>> and GC content for that interval?
>
> The Rsamtools package has scanBam for read input, also
> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
> for higher-level read counting. The requirement is generally a
> 'GRanges', which can be queried from TxDb packages (e.g.,
> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
> rtracklayer, and Rsamtools packages; see
>
> http://bioconductor.org/packages/release/BiocViews.html#___Software

More specifically, after

   library(Rsamtools)
   example(scanBam)  # defines 'fl', a path to a bam file

for a _single_ genomic range

   param = ScanBamParam(what="seq",
                        which=GRanges("seq1", IRanges(100, 500)))
   dna = scanBam(fl, param=param)[[1]][["seq"]]
   length(dna)  # 365 reads overlap region
   alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC

though you'd likely want to specify several regions (vector arguments to 
GRanges) and think about flags (scanBamFlag() and the flag argument to 
ScanBamParam), read mapping quality, reads overlapping more than one 
region, etc. (summarizeOverlaps implements several counting strategies, 
but it is 'easy' to implement arbitrary approaches).

>
> Martin
>
>>
>> Thanks for any input!
>>
>> Best,
>> Yu Chuan
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list