[BioC] CIGAR-aware coverage
Hervé Pagès
hpages at fhcrc.org
Fri Feb 28 20:40:59 CET 2014
On 02/28/2014 10:58 AM, Michael Lawrence wrote:
>
>
>
> On Fri, Feb 28, 2014 at 9:47 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Chris,
>
>
> On 02/27/2014 11:00 PM, chris warth wrote:
>
> GenomicAlignments is mostly a reorganization of pre-existing
> functionality.
> You should be able to call coverage() directly on your
> GAlignments, or on
> the BamFile itself, with full support for cigar strings.
> Also, it is
> sufficient to call readGAlignments(), i.e., no need for
> "FromBam".
>
>
> Thanks for this, I tried calling coverage() on GAlignments
> before and
> it wouldn't work for me, but looking closer I now understand the
> error.
>
> > class(myalignments)
> [1] "GAlignments"
> attr(,"package")
> [1] "GenomicRanges"
>
> > coverage(myalignments)
> Error in coverage(grglist(x, drop.D.ranges = drop.D.ranges),
> shift = shift, :
> error in evaluating the argument 'x' in selecting a method for
> function 'coverage': Error in validObject(.Object) :
> invalid class "GRangesList" object: 'mcols(x)' cannot have
> columns
> named "seqnames", "ranges", "strand", "start", "end", "width", or
> "element"
>
>
> Eliminating the metadata (what = c("pos", "qwidth", "strand"))
> in the
> alignments allows coverage() to work now.
>
> Why in the world does coverage() care if the alignments have
> strand metadata?
>
>
> It's not coverage() that cares. It's the internal coercion to
> GRangesList that happens internally before the coverage is actually
> computed. This internal coercion tries to propagate the metadata
> columns, but, for reasons I never really understood, GRanges and
> GRangesList objects are not allowed to hold a metadata column named
> "strand".
>
>
> I think this is because there is otherwise ambiguity when coercing to a
> data.frame and also comes up now in the coercion of GRanges to an
> environment. We could just admit warnings in those cases though and
> rename with make.unique().
A warning would be good. I'm not a big fan of automatic renaming though.
Whatever we choose, we want as.data.frame() to handle the metadata cols
consistently. Right now it doesn't:
With a GAlignments:
> gal
GAlignments with 3 alignments and 2 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
[1] seq1 + 36M 36 1 36 36
[2] seq1 + 35M 35 3 37 35
[3] seq1 + 35M 35 5 39 35
njunc | flag strand
<integer> | <integer> <factor>
[1] 0 | 73 +
[2] 0 | 73 +
[3] 0 | 137 +
---
seqlengths:
seq1 seq2
1575 1584
> as.data.frame(gal)
seqnames strand cigar qwidth start end width njunc flag strand
1 seq1 + 36M 36 1 36 36 0 73 +
2 seq1 + 35M 35 3 37 35 0 73 +
3 seq1 + 35M 35 5 39 35 0 137 +
With a GRanges (I artificially created an invalid object):
> gr
GRanges with 4 ranges and 1 metadata column:
seqnames ranges strand | strand
<Rle> <IRanges> <Rle> | <character>
[1] seq1 [1, 36] + | -
[2] seq1 [3, 37] + | -
[3] seq1 [5, 39] + | -
[4] seq1 [6, 41] + | -
---
seqlengths:
seq1 seq2
1575 1584
> as.data.frame(gr)
seqnames start end width strand strand.1
1 seq1 1 36 36 + -
2 seq1 3 37 35 + -
3 seq1 5 39 35 + -
4 seq1 6 41 36 + -
H.
>
> The fix is easy: this internal coercion to GRangesList should
> not propagate the metadata columns (they're not needed for computing the
> coverage and they slow down the coercion). I'll fix this today.
>
> Thanks for the report.
> H.
>
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list