[BioC] Amplicon and exon level read counts and GC content
Martin Morgan
mtmorgan at fhcrc.org
Wed May 30 07:59:56 CEST 2012
On 05/29/2012 10:44 PM, Martin Morgan wrote:
> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>> Hi,
>>
>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>> sequencing, is there any package/function which can extract
>> amplicon-level read counts and GC content from an aligned file of BAM
>> format? The same question to exon-level read counts and GC content. More
>> generally, given a genomic interval, how could I extract the read count
>> and GC content for that interval?
>
> The Rsamtools package has scanBam for read input, also
> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
> for higher-level read counting. The requirement is generally a
> 'GRanges', which can be queried from TxDb packages (e.g.,
> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
> rtracklayer, and Rsamtools packages; see
>
> http://bioconductor.org/packages/release/BiocViews.html#___Software
More specifically, after
library(Rsamtools)
example(scanBam) # defines 'fl', a path to a bam file
for a _single_ genomic range
param = ScanBamParam(what="seq",
which=GRanges("seq1", IRanges(100, 500)))
dna = scanBam(fl, param=param)[[1]][["seq"]]
length(dna) # 365 reads overlap region
alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
though you'd likely want to specify several regions (vector arguments to
GRanges) and think about flags (scanBamFlag() and the flag argument to
ScanBamParam), read mapping quality, reads overlapping more than one
region, etc. (summarizeOverlaps implements several counting strategies,
but it is 'easy' to implement arbitrary approaches).
>
> Martin
>
>>
>> Thanks for any input!
>>
>> Best,
>> Yu Chuan
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list