[Bioc-devel] Rsamtools BAM > SAM?
Martin Morgan
mtmorgan at fhcrc.org
Thu Dec 5 17:19:12 CET 2013
On 12/05/2013 06:37 AM, Taku Tokuyasu wrote:
> Martin, Michael:
>
> Thank you for the prompt replies! Glad to know the functionality is available.
> The suggestions on alternatives are also quite helpful. Amongst other
> reasons, we wish to support htseq-count because it is part of a reference
> RNA-seq protocol (http://www.ncbi.nlm.nih.gov/pubmed/23975260), and because I
> haven't gotten around to comparing timing and functionality with
> GenomicAlignments. This is probably in a different thread, but e.g. is it known
> that GenomicAlignments::summarizeOverlaps() produces identical results to
> htseq-count with equivalent parameters? What about timing?
Perhaps Valerie can chime in with specifics.
It is very hard to say that two algorithms are the 'same', in general the
functionality of ht-seq and summarizeOverlaps is comparable; both support
multiple counting modes (including in the case of summarizeOverlaps
user-specified modes) but it would not be completely surprising if there were
cases where the behaviour was different.
The parathyroidSE vignette
http://bioconductor.org/packages/release/data/experiment/html/parathyroidSE.html
details use of summarizeOverlaps for counting.
The Rsubread manuscript Table 1
http://www.ncbi.nlm.nih.gov/pubmed/24227677
contains a performance comparison, although the memory consumption of of
summarizeOverlaps as reported in that paper appears to rely on reading an entire
file in to memory (files are iterated in chunks when processed in the way
described by the parathyroidSE vignette; this should have minimal cost for
overall speed but provide more reasonable memory management). Rsubread is not
available on Windows.
Martin
>
> _Taku
>
>
>
> On Thu, Dec 5, 2013 at 5:14 AM, Michael Lawrence <lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>> wrote:
>
> It would be appreciated to know where the current counting methods fall
> short; i.e., why htseq-count is necessary.
>
> Thanks,
> Michael
>
>
> On Thu, Dec 5, 2013 at 12:30 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 12/04/2013 11:17 PM, Taku Tokuyasu wrote:
>
> Hello,
>
> We are trying to support NGS pipelines with SAM input to
> htseq-count, w/o
> requiring a samtools install. Has Rsamtools implemented / planned
> any support
> for BAM to SAM conversion? I notice this has been requested before
> (e.g. May
> 2012,
> http://comments.gmane.org/__gmane.science.biology.__informatics.conductor/41344
> <http://comments.gmane.org/gmane.science.biology.informatics.conductor/41344>),
> so
> I was wondering if there were any updates on this.
>
>
> I added this to Rsamtools 1.15.14; this requires other package
> dependencies to be current. I had delayed adding this because it seems a
> step backward, to a relatively inefficient representation.
>
> Other counting alternatives not requiring coercion to SAM files are
> GenomicAlignments::__summarizeOverlaps (acting on BamFile / BamViews /
> GAlignments / GAlignmentPairs, counting bins as GRanges / GRanges list
> from TxDb or standard GTF etc files via rtracklayer::import) and
> Rsubread::featureCount (SAM or BAM files, built-in or GTF or SAF
> annotation files).
>
> Please let me know if there are issues with the implementation.
>
> Martin
>
>
> Regards,
>
> _Taku
>
> Taku Tokuyasu
> Computational Biology Core
> UCSF Helen Diller Family Comprehensive Cancer Center
>
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list