[Bioc-devel] Rsamtools BAM > SAM?

Valerie Obenchain vobencha at fhcrc.org
Thu Dec 5 18:03:00 CET 2013


In addition to the resources Martin mentions,

     library(GenomicAlignments)
     browseVignettes("GenomicAlignments")

gets the 'Counting reads with summarizeOverlaps' vignette. (In the 
release branch this vignette lives in GenomicRanges.) The vignette 
covers the counting modes in detail and includes a workflow with the 
pasilla data.

Valerie


On 12/05/2013 08:19 AM, Martin Morgan wrote:
> On 12/05/2013 06:37 AM, Taku Tokuyasu wrote:
>> Martin, Michael:
>>
>> Thank you for the prompt replies!  Glad to know the functionality is
>> available.
>>   The suggestions on alternatives are also quite helpful.  Amongst other
>> reasons, we wish to support htseq-count because it is part of a reference
>> RNA-seq protocol (http://www.ncbi.nlm.nih.gov/pubmed/23975260), and
>> because I
>> haven't gotten around to comparing timing and functionality with
>> GenomicAlignments.  This is probably in a different thread, but e.g.
>> is it known
>> that GenomicAlignments::summarizeOverlaps() produces identical results to
>> htseq-count with equivalent parameters?  What about timing?
>
> Perhaps Valerie can chime in with specifics.
>
> It is very hard to say that two algorithms are the 'same', in general
> the functionality of ht-seq and summarizeOverlaps is comparable; both
> support multiple counting modes (including in the case of
> summarizeOverlaps user-specified modes) but it would not be completely
> surprising if there were cases where the behaviour was different.
>
> The parathyroidSE vignette
>
>
> http://bioconductor.org/packages/release/data/experiment/html/parathyroidSE.html
>
>
> details use of summarizeOverlaps for counting.
>
> The Rsubread manuscript Table 1
>
>    http://www.ncbi.nlm.nih.gov/pubmed/24227677
>
> contains a performance comparison, although the memory consumption of of
> summarizeOverlaps as reported in that paper appears to rely on reading
> an entire file in to memory (files are iterated in chunks when processed
> in the way described by the parathyroidSE vignette; this should have
> minimal cost for overall speed but provide more reasonable memory
> management). Rsubread is not available on Windows.
>
> Martin
>
>>
>> _Taku
>>
>>
>>
>> On Thu, Dec 5, 2013 at 5:14 AM, Michael Lawrence
>> <lawrence.michael at gene.com
>> <mailto:lawrence.michael at gene.com>> wrote:
>>
>>     It would be appreciated to know where the current counting methods
>> fall
>>     short; i.e., why htseq-count is necessary.
>>
>>     Thanks,
>>     Michael
>>
>>
>>     On Thu, Dec 5, 2013 at 12:30 AM, Martin Morgan <mtmorgan at fhcrc.org
>>     <mailto:mtmorgan at fhcrc.org>> wrote:
>>
>>         On 12/04/2013 11:17 PM, Taku Tokuyasu wrote:
>>
>>             Hello,
>>
>>             We are trying to support NGS pipelines with SAM input to
>>             htseq-count, w/o
>>             requiring a samtools install.  Has Rsamtools implemented /
>> planned
>>             any support
>>             for BAM to SAM conversion?  I notice this has been
>> requested before
>>             (e.g. May
>>             2012,
>>
>> http://comments.gmane.org/__gmane.science.biology.__informatics.conductor/41344
>>
>>
>> <http://comments.gmane.org/gmane.science.biology.informatics.conductor/41344>),
>>
>>             so
>>             I was wondering if there were any updates on this.
>>
>>
>>         I added this to Rsamtools 1.15.14; this requires other package
>>         dependencies to be current. I had delayed adding this because
>> it seems a
>>         step backward, to a relatively inefficient representation.
>>
>>         Other counting alternatives not requiring coercion to SAM
>> files are
>>         GenomicAlignments::__summarizeOverlaps (acting on BamFile /
>> BamViews /
>>         GAlignments / GAlignmentPairs, counting bins as GRanges /
>> GRanges list
>>         from TxDb or standard GTF etc files via rtracklayer::import) and
>>         Rsubread::featureCount (SAM or BAM files, built-in or GTF or SAF
>>         annotation files).
>>
>>         Please let me know if there are issues with the implementation.
>>
>>         Martin
>>
>>
>>             Regards,
>>
>>             _Taku
>>
>>             Taku Tokuyasu
>>             Computational Biology Core
>>             UCSF Helen Diller Family Comprehensive Cancer Center
>>
>>
>>
>>
>>         --
>>         Computational Biology / Fred Hutchinson Cancer Research Center
>>         1100 Fairview Ave. N.
>>         PO Box 19024 Seattle, WA 98109
>>
>>         Location: Arnold Building M1 B861
>>         Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>>
>>         _________________________________________________
>>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> mailing list
>>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>
>



More information about the Bioc-devel mailing list