[BioC] Amplicon and exon level read counts and GC content
Yu Chuan Tai
yuchuan at stat.berkeley.edu
Sat Jun 9 04:18:00 CEST 2012
Hi Martin,
A quick question. Does it matter if I don't have the strand info. for the
interval that I am interested in, when I specify the input arguments for
ScanBamParam()?
Best,
Yu Chuan
On Thu, 7 Jun 2012, Martin Morgan wrote:
> On 06/06/2012 09:53 PM, Yu Chuan Tai wrote:
>> Hi Martin,
>>
>> More questions on your approaches below. If my BAM files are
>> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(),
>> should I set isPaired=TRUE? Do I need to worry about other input
>> arguments for scanBamFlag() or ScanBamParam(), if I want to
>> calculate coverage properly?
>
> It really depends on what you're interested in doing; see for instance the
> post by Herve the other day
>
> https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html
>
>>
>> Also, summarizeOverlaps() doesn't seem to handle paired-end reads.
>> How to get around this, or it won't affect coverage calculation?
>
> There is better support for paired-end reads in the 'devel' version of
> Biocondcutor; see
>
> http://bioconductor.org/developers/useDevel/
>
> whether and what aspects of paired-endedness are important depends on how you
> are using your coverage.
>
>>
>> Finally, is there any way to calculate base-specific coverage at any
>> genomic locus or interval in Rsamtools? Thanks!
>
> I tried to answer this in your other post.
>
> Martin
>
>>
>> Best, Yu Chuan
>>
>>> More specifically, after
>>>
>>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a
>>> bam file
>>>
>>> for a _single_ genomic range
>>>
>>> param = ScanBamParam(what="seq", which=GRanges("seq1",
>>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]]
>>> length(dna) # 365 reads overlap region alphabetFrequency(dna,
>>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>>>
>>> though you'd likely want to specify several regions (vector
>>> arguments to GRanges) and think about flags (scanBamFlag() and the
>>> flag argument to ScanBamParam), read mapping quality, reads
>>> overlapping more than one region, etc. (summarizeOverlaps
>>> implements several counting strategies, but it is 'easy' to
>>> implement arbitrary approaches).
>>>
>>>>
>>>> Martin
>>>>
>>>>>
>>>>> Thanks for any input!
>>>>>
>>>>> Best, Yu Chuan
>>>>>
>>>>> _______________________________________________ Bioconductor
>>>>> mailing list Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>>> archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>
>>>
>>>
>>>>>
>>>>>
> --
>>> Computational Biology Fred Hutchinson Cancer Research Center 1100
>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861 Telephone: 206 667-2793
>>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
More information about the Bioconductor
mailing list