[Bioc-devel] cigarToRleList fails
Hervé Pagès
hpages at fhcrc.org
Fri Feb 21 20:17:34 CET 2014
Hi Gabriel,
On 02/20/2014 05:03 PM, Gabriel Becker wrote:
> Herve,
>
> The help is correct (though possibly a bit pedantic), there is no
> method for that signature.
But the dispatch mechanism is able to find one because
'import(path_to_gff)' *does* work. So the help maybe is correct
but that doesn't really help the naive user.
H.
>
> ?import("", "")
>
> works for me though
>
> ~G
>
>
> On Thu, Feb 20, 2014 at 4:51 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> On 02/20/2014 04:16 PM, Michael Lawrence wrote:
>
> There's also "?coverage(ga)", so the user can see what happens
> specifically for their object, without worrying about typing out
> the class.
>
>
> I was never lucky with this:
>
> > library(rtracklayer)
> > path_to_gff <- system.file("tests", "v1.gff", package =
> "rtracklayer")
> > ?import(path_to_gff)
> Error in .helpForCall(topicExpr, parent.frame()) :
> no documentation for function ‘import’ and signature ‘con =
> "character", format = "ANY", text = "ANY"’
> In addition: Warning message:
> In .helpForCall(topicExpr, parent.frame()) :
> no method defined for function ‘import’ and signature ‘con =
> "character", format = "ANY", text = "ANY"’
>
> H.
>
>
> At some point it would be neat to generate S4 documentation at
> run-time.
> Just popup a browser and see every method that might be
> dispatched with
> a given object. In theory, the R help server would support this.
>
>
> On Thu, Feb 20, 2014 at 3:13 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>
>
> On 02/20/2014 02:55 PM, Martin Morgan wrote:
>
> On 02/20/2014 02:32 PM, Hervé Pagès wrote:
>
> Hi Jesper,
>
> On 02/20/2014 02:13 PM, Jesper Gådin wrote:
>
> Very true that it is quite difficult to find the
> documentation when one
> is not aware of its existence :P
>
>
> Yeah, this has been a source of frustration for
> many people. And
> always a source of embarrassment (for us) when
> teaching our
> software
> to beginners.
>
> I've started to change this. In the upcoming
> version of BioC
> (2.14,
> scheduled for mid-April), when you'll do ?coverage,
> you'll
> get to
> choose between the 3 man pages that document coverage
> methods (there
> is one in IRanges, one in GenomicRanges, and one in
> GenomicAlignments).
>
> I want to generalize this to other generics that have
> methods spread
> across several packages (e.g. findOverlaps, the
> intra- and
> inter-range
> methods, etc...).
>
> Having to choose between several man pages every
> time you do
> e.g.
> ?findOverlaps is a minor annoyance compared to not
> being able to
> find the man page at all. (Note that if you already
> know
> where is
> your favorite man page, you'll be able to direct
> access it with
> e.g. ?GenomicRanges::findOverlaps). Nobody will
> ever need to use
> the insane
>
> ?`findOverlaps,GenomicRanges,____GenomicRanges-method` to
>
>
>
> tab completion helps, so that you don't need to be totally
> insane, just
> insane enough to know to start with
>
> ?"cover<tab>
>
>
> and also insane enough to know which method you need to
> pick up amongst
> the 30+ methods listed by ?"findOverlaps<tab>
> Overwhelming!
>
> H.
>
>
>
> Martin
>
> open that man page again. Ever! (it will still work
> though...)
>
> Cheers,
> H.
>
>
> coverage() is fast and beautiful. Thanks!
>
> /Jesper
>
>
> On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès
> <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>> wrote:
>
> Hi Jesper,
>
>
> On 02/19/2014 08:44 AM, Michael Lawrence
> wrote:
>
> On Wed, Feb 19, 2014 at 8:39 AM,
> Jesper Gådin
> <jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>
> <mailto:jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>__>
> <mailto:jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>
> <mailto:jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>__>__>>wrote:
>
>
> Hi Michael,
>
> Herves suggestion will probably
> work for my
> use case, but if
> there are any
> smoother ways it would be preferable.
>
> The use case is as follows:
>
> 1) calculate strand specific
> coverage over
> a region from
> GAlignments object (or file)
>
> At the moment I read a file using
> readGAlignmentsFromBam()
> with tag="XS",
> then filter it on "flag" and
> "mapq". Then I
> subset the
> resulting GAlignments in a minus
> and a plus
> -strand object.
> Then I calculate coverage by my own
> coverage function which
> uses the cigar
> information in the GAlignments
> object. This
> function is the
> one using
> cigarToRleList() at the moment. As I
> understand the
> coverage() function
> from the GenomicAlignments/IRanges
> package
> does not take
> into account
> cigars, or does it?
>
>
> It does take the coverage into account;
> specifically to exclude
> the introns
> from coverage. I think there's also an
> option
> to exclude
> deletions.
>
>
> Unfortunately the man page is not easy to
> access
> (you need to do
> ?`coverage,GAlignments-method`______), but
> it says:
>
> The methods for GAlignments and
> GAlignmentPairs
> objects do:
>
> coverage(grglist(x), ...)
>
> And if you do grglist() on a GAlignments or
> GAlignmentPairs
> objects, the
> ranges you get in the returned GRangesList
> object
> are calculated
> based
> on the CIGAR.
>
> Trust but verify. Here is how you can actually
> verify that
> coverage()
> does take the CIGAR into account:
>
> library(RNAseqData.HNRNPC.bam.______chr14)
> gal <-
>
> readGAlignments(RNAseqData.______HNRNPC.bam.chr14_BAMFILES[1])
>
> cig_op_table <- cigarOpTable(cigar(gal))
>
> First we pick up an alignment with an N in its
> CIGAR and do
> coverage()
> on it:
>
> > gal_with_N <- gal[which(cig_op_table[
> , "N"]
> != 0)[1]]
>
> > gal_with_N
> GAlignments with 1 alignment and 0
> metadata columns:
> seqnames strand cigar qwidth
> start end
> width
> <Rle> <Rle> <character> <integer>
> <integer> <integer>
> <integer>
> [1] chr14 + 55M2117N17M 72
> 19650072 19652260
> 2189
> ngap
> <integer>
> [1] 1
> ---
> seqlengths:
> chr1
> chr10 ...
> chrY
> 249250621
> 135534747 ...
> 59373566
>
> > coverage(gal_with_N)$chr14
> integer-Rle of length 107349540 with 5 runs
> Lengths: 19650071 55 2117
> 17
> 87697280
> Values : 0 1 0
> 1
> 0
>
> Same thing with an alignment with an I in
> its CIGAR:
>
> > gal_with_I <- gal[which(cig_op_table[
> , "I"]
> != 0)[1]]
>
> > gal_with_I
> GAlignments with 1 alignment and 0
> metadata columns:
> seqnames strand cigar qwidth
> start end
> width
> <Rle> <Rle> <character> <integer>
> <integer> <integer>
> <integer>
> [1] chr14 - 64M1I7M 72
> 19411677 19411747
> 71
> ngap
> <integer>
> [1] 0
> ---
> seqlengths:
> chr1
> chr10 ...
> chrY
> 249250621
> 135534747 ...
> 59373566
>
> > coverage(gal_with_I)$chr14
> integer-Rle of length 107349540 with 3 runs
> Lengths: 19411676 71 87937793
> <tel:71%2087937793>
> Values : 0 1 0
>
> Same thing with an alignment with a D in
> its CIGAR:
>
> > gal_with_D <- gal[which(cig_op_table[
> , "D"]
> != 0)[1]]
>
> > gal_with_D
> GAlignments with 1 alignment and 0
> metadata columns:
> seqnames strand cigar qwidth
> start end
> width
> <Rle> <Rle> <character> <integer>
> <integer> <integer>
> <integer>
> [1] chr14 + 38M1D34M 72
> 19659063 19659135
> 73
> ngap
> <integer>
> [1] 0
> ---
> seqlengths:
> chr1
> chr10 ...
> chrY
> 249250621
> 135534747 ...
> 59373566
>
> > coverage(gal_with_D)$chr14
> integer-Rle of length 107349540 with 3 runs
> Lengths: 19659062 73 87690405
> Values : 0 1 0
>
> Seeing is believing,
>
> Cheers,
> H.
>
>
>
> I started to look at the
> applyPileups() in
> Rsamtools which I
> can get to
> calculate coverage using cigars,
> but not
> using the strand or
> flag
> information for filtering. That
> solution
> would start from a
> bam-file
> instead of a GAlignments object,
> and sure I
> can do the
> filtering outside R.
> But it would be very convenient to
> do it
> all from within R.
>
> If there are nice solutions
> starting from
> both a GAlignments
> and a
> bam-file it would be great! =)
>
> /Jesper
>
>
>
> On Tue, Feb 18, 2014 at 10:52 PM,
> Michael
> Lawrence <
> lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
> <mailto:lawrence.michael at gene.__com
> <mailto:lawrence.michael at gene.com>>
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>____com
>
> <mailto:lawrence.michael at gene.__com
> <mailto:lawrence.michael at gene.com>>>> wrote:
>
> Hi Jesper,
>
> Would you be willing to
> volunteer your
> use case? As
> Herve hinted,
> cigarToRleList and friends are
> low-level helpers. There
> may be an easier
> way to achieve what you want,
> or an
> opportunity to
> improve things.
>
> Michael
>
>
> On Mon, Feb 17, 2014 at 1:10
> AM, Jesper
> Gådin
> <jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>
> <mailto:jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>__>
> <mailto:jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>
> <mailto:jesper.gadin at gmail.com
> <mailto:jesper.gadin at gmail.com>__>__>>wrote:
>
>
> Hi,
>
> Have come across a
> cigar-vector
> that is problematic
> to process.
>
> #load package
>
> library(GenomicAlignments)
>
>
> #load data (see attached file)
>
>
> load("2014-02-17-cigarExample.______rdata")
>
>
>
> #run function cigarToRleList
>
> cigarRle <-
> cigarToRleList(cigarExample)
>
> Error in
> .Call2("Rle_constructor",
> values, lengths,
> check, 0L, PACKAGE =
> "IRanges") :
> integer overflow while
> summing
> elements in
> 'lengths'
>
> sessionInfo()
>
> R Under development (unstable)
> (2013-11-13 r64209)
> Platform:
> x86_64-unknown-linux-gnu
> (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8
> LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8
> LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8
> LC_NAME=C
> [9] LC_ADDRESS=C
> LC_TELEPHONE=C
> [11]
> LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats
> graphics
> grDevices utils
> datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicAlignments_0.99.18
> Rsamtools_1.15.26
> [3] Biostrings_2.31.12
> XVector_0.3.6
> [5] GenomicRanges_1.15.26
> IRanges_1.21.23
> [7] BiocGenerics_0.9.3
>
> loaded via a namespace
> (and not
> attached):
> [1] BatchJobs_1.1-1135
> BBmisc_1.5
> BiocParallel_0.5.8
> bitops_1.0-6
>
> [5] brew_1.0-6
> codetools_0.2-8
> DBI_0.2-7
> digest_0.6.4
>
> [9] fail_1.2
> foreach_1.4.1
> iterators_1.0.6 plyr_1.8
>
> [13] RSQLite_0.11.4
> sendmailR_1.1-2
> stats4_3.1.0
> tools_3.1.0
>
> [17] zlibbioc_1.9.0
>
>
>
> _____________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>
>
> <mailto:Bioc-devel at r-project.
> <mailto:Bioc-devel at r-project.>____org
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>> mailing list
> https://stat.ethz.ch/mailman/______listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
>
>
>
>
>
> [[alternative HTML version
> deleted]]
>
>
>
>
>
> _____________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>
> <mailto:Bioc-devel at r-project.
> <mailto:Bioc-devel at r-project.>____org
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>>
> mailing list
> https://stat.ethz.ch/mailman/______listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
>
> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list