[BioC] Amplicon and exon level read counts and GC content
Wei Shi
shi at wehi.EDU.AU
Wed May 30 08:34:22 CEST 2012
You may use the following command to convert BAM to SAM if you got Samtools installed on a unix computer:
samtools view aligned.BAM > aligned.SAM
It will be nice if Rsamtools can support this conversion as Samtools supports this conversion. The reason why featureCounts in Rsubread does not support BAM files is because we run this function straight after running the align() function for the alignment. But I think we may need to add support for this ...
Wei
On May 30, 2012, at 4:18 PM, Martin Morgan wrote:
> On 05/29/2012 11:12 PM, Yu Chuan Tai wrote:
>> Hi Martin,
>>
>> Thanks for your quick reply and great help! I will try what you
>> suggested below and let you know if I still have any question. A quick
>> question. Can RSamTools convert BAM to SAM?
>
> No; it will go the other way around (SAM --> BAM, see ?asBam) which is generally the right thing to do (smaller files, more easily 'queried' for regions of interest, etc, especially after indexing ?indexBam). Martin
>
>>
>> Best,
>> Yu Chuan
>>
>> On Tue, 29 May 2012, Martin Morgan wrote:
>>
>>> On 05/29/2012 10:44 PM, Martin Morgan wrote:
>>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>>>>> Hi,
>>>>>
>>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>>>>> sequencing, is there any package/function which can extract
>>>>> amplicon-level read counts and GC content from an aligned file of BAM
>>>>> format? The same question to exon-level read counts and GC content.
>>>>> More
>>>>> generally, given a genomic interval, how could I extract the read count
>>>>> and GC content for that interval?
>>>>
>>>> The Rsamtools package has scanBam for read input, also
>>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
>>>> for higher-level read counting. The requirement is generally a
>>>> 'GRanges', which can be queried from TxDb packages (e.g.,
>>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
>>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
>>>> rtracklayer, and Rsamtools packages; see
>>>>
>>>> http://bioconductor.org/packages/release/BiocViews.html#___Software
>>>
>>> More specifically, after
>>>
>>> library(Rsamtools)
>>> example(scanBam) # defines 'fl', a path to a bam file
>>>
>>> for a _single_ genomic range
>>>
>>> param = ScanBamParam(what="seq",
>>> which=GRanges("seq1", IRanges(100, 500)))
>>> dna = scanBam(fl, param=param)[[1]][["seq"]]
>>> length(dna) # 365 reads overlap region
>>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>>>
>>> though you'd likely want to specify several regions (vector arguments
>>> to GRanges) and think about flags (scanBamFlag() and the flag argument
>>> to ScanBamParam), read mapping quality, reads overlapping more than
>>> one region, etc. (summarizeOverlaps implements several counting
>>> strategies, but it is 'easy' to implement arbitrary approaches).
>>>
>>>>
>>>> Martin
>>>>
>>>>>
>>>>> Thanks for any input!
>>>>>
>>>>> Best,
>>>>> Yu Chuan
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioconductor
mailing list