[BioC] Amplicon and exon level read counts and GC content

Martin Morgan mtmorgan at fhcrc.org
Sat Jun 9 04:25:34 CEST 2012


On 06/08/2012 07:18 PM, Yu Chuan Tai wrote:
> Hi Martin,
>
> A quick question. Does it matter if I don't have the strand info. for
> the interval that I am interested in, when I specify the input arguments
> for ScanBamParam()?

See

   ?ScanBamParam

which says


    which: A 'GRanges', 'RangesList', 'RangedData', or missing object,
           from which a 'IRangesList' instance will be constructed.
           Names of the 'IRangesList' correspond to reference sequences,
           and ranges to the regions on that reference sequence for
           which matches are desired. Because data types are coerced to
           'IRangesList', 'which' does _not_ include strand information
           (use the 'flag' argument instead). Only records with a read
           overlapping the specified ranges are returned. All ranges
           must have ends less than or equal to 536870912.

If you provide GRanges, with strand information, the strand information 
is ignored. If you want reads only on the plus strand, see (on the same 
help page)

isMinusStrand: A logical(1) indicating whether reads aligned to the
           plus (FALSE), minus (TRUE), or any (NA) strand should be
           returned.

so ScanBamParam(flag=scanBamFlag(isMinusStrand=FALSE))

Martin

>
> Best,
> Yu Chuan
>
> On Thu, 7 Jun 2012, Martin Morgan wrote:
>
>> On 06/06/2012 09:53 PM, Yu Chuan Tai wrote:
>>> Hi Martin,
>>>
>>> More questions on your approaches below. If my BAM files are
>>> generated by Bowtie2 on pair-end fastq files, for scanBamFlag(),
>>> should I set isPaired=TRUE? Do I need to worry about other input
>>> arguments for scanBamFlag() or ScanBamParam(), if I want to
>>> calculate coverage properly?
>>
>> It really depends on what you're interested in doing; see for instance
>> the post by Herve the other day
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2012-June/046052.html
>>
>>>
>>> Also, summarizeOverlaps() doesn't seem to handle paired-end reads.
>>> How to get around this, or it won't affect coverage calculation?
>>
>> There is better support for paired-end reads in the 'devel' version of
>> Biocondcutor; see
>>
>> http://bioconductor.org/developers/useDevel/
>>
>> whether and what aspects of paired-endedness are important depends on
>> how you are using your coverage.
>>
>>>
>>> Finally, is there any way to calculate base-specific coverage at any
>>> genomic locus or interval in Rsamtools? Thanks!
>>
>> I tried to answer this in your other post.
>>
>> Martin
>>
>>>
>>> Best, Yu Chuan
>>>
>>>> More specifically, after
>>>>
>>>> library(Rsamtools) example(scanBam) # defines 'fl', a path to a
>>>> bam file
>>>>
>>>> for a _single_ genomic range
>>>>
>>>> param = ScanBamParam(what="seq", which=GRanges("seq1",
>>>> IRanges(100, 500))) dna = scanBam(fl, param=param)[[1]][["seq"]]
>>>> length(dna) # 365 reads overlap region alphabetFrequency(dna,
>>>> collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>>>>
>>>> though you'd likely want to specify several regions (vector
>>>> arguments to GRanges) and think about flags (scanBamFlag() and the
>>>> flag argument to ScanBamParam), read mapping quality, reads
>>>> overlapping more than one region, etc. (summarizeOverlaps
>>>> implements several counting strategies, but it is 'easy' to
>>>> implement arbitrary approaches).
>>>>
>>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>> Thanks for any input!
>>>>>>
>>>>>> Best, Yu Chuan
>>>>>>
>>>>>> _______________________________________________ Bioconductor
>>>>>> mailing list Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>>>> archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>>>
>>>>>>
>> --
>>>> Computational Biology Fred Hutchinson Cancer Research Center 1100
>>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: M1-B861 Telephone: 206 667-2793
>>>>
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list