[Bioc-devel] cigarToRleList fails

Hervé Pagès hpages at fhcrc.org
Fri Feb 21 01:51:12 CET 2014


On 02/20/2014 04:16 PM, Michael Lawrence wrote:
> There's also "?coverage(ga)", so the user can see what happens
> specifically for their object, without worrying about typing out the class.

I was never lucky with this:

   > library(rtracklayer)
   > path_to_gff <- system.file("tests", "v1.gff", package = "rtracklayer")
   > ?import(path_to_gff)
   Error in .helpForCall(topicExpr, parent.frame()) :
     no documentation for function ‘import’ and signature ‘con = 
"character", format = "ANY", text = "ANY"’
   In addition: Warning message:
   In .helpForCall(topicExpr, parent.frame()) :
     no method defined for function ‘import’ and signature ‘con = 
"character", format = "ANY", text = "ANY"’

H.

>
> At some point it would be neat to generate S4 documentation at run-time.
> Just popup a browser and see every method that might be dispatched with
> a given object. In theory, the R help server would support this.
>
>
> On Thu, Feb 20, 2014 at 3:13 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>
>
>     On 02/20/2014 02:55 PM, Martin Morgan wrote:
>
>         On 02/20/2014 02:32 PM, Hervé Pagès wrote:
>
>             Hi Jesper,
>
>             On 02/20/2014 02:13 PM, Jesper Gådin wrote:
>
>                 Very true that it is quite difficult to find the
>                 documentation when one
>                 is not aware of its existence :P
>
>
>             Yeah, this has been a source of frustration for many people. And
>             always a source of embarrassment (for us) when teaching our
>             software
>             to beginners.
>
>             I've started to change this. In the upcoming version of BioC
>             (2.14,
>             scheduled for mid-April), when you'll do ?coverage, you'll
>             get to
>             choose between the 3 man pages that document coverage
>             methods (there
>             is one in IRanges, one in GenomicRanges, and one in
>             GenomicAlignments).
>
>             I want to generalize this to other generics that have
>             methods spread
>             across several packages (e.g. findOverlaps, the intra- and
>             inter-range
>             methods, etc...).
>
>             Having to choose between several man pages every time you do
>             e.g.
>             ?findOverlaps is a minor annoyance compared to not being able to
>             find the man page at all. (Note that if you already know
>             where is
>             your favorite man page, you'll be able to direct access it with
>             e.g. ?GenomicRanges::findOverlaps). Nobody will ever need to use
>             the insane
>             ?`findOverlaps,GenomicRanges,__GenomicRanges-method` to
>
>
>         tab completion helps, so that you don't need to be totally
>         insane, just
>         insane enough to know to start with
>
>         ?"cover<tab>
>
>
>     and also insane enough to know which method you need to pick up amongst
>     the 30+ methods listed by ?"findOverlaps<tab>
>     Overwhelming!
>
>     H.
>
>
>
>         Martin
>
>             open that man page again. Ever! (it will still work though...)
>
>             Cheers,
>             H.
>
>
>                 coverage() is fast and beautiful. Thanks!
>
>                 /Jesper
>
>
>                 On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès
>                 <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                 <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>                      Hi Jesper,
>
>
>                      On 02/19/2014 08:44 AM, Michael Lawrence wrote:
>
>                          On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin
>                          <jesper.gadin at gmail.com
>                 <mailto:jesper.gadin at gmail.com>
>                 <mailto:jesper.gadin at gmail.com
>                 <mailto:jesper.gadin at gmail.com>__>>wrote:
>
>                              Hi Michael,
>
>                              Herves suggestion will probably work for my
>                 use case, but if
>                              there are any
>                              smoother ways it would be preferable.
>
>                              The use case is as follows:
>
>                              1) calculate strand specific coverage over
>                 a region from
>                              GAlignments object (or file)
>
>                              At the moment I read a file using
>                 readGAlignmentsFromBam()
>                              with tag="XS",
>                              then filter it on "flag" and "mapq". Then I
>                 subset the
>                              resulting GAlignments in a minus and a plus
>                 -strand object.
>                              Then I calculate coverage by my own
>                 coverage function which
>                              uses the cigar
>                              information in the GAlignments object. This
>                 function is the
>                              one using
>                              cigarToRleList() at the moment. As I
>                 understand the
>                              coverage() function
>                              from the GenomicAlignments/IRanges package
>                 does not take
>                              into account
>                              cigars, or does it?
>
>
>                          It does take the coverage into account;
>                 specifically to exclude
>                          the introns
>                          from coverage. I think there's also an option
>                 to exclude
>                 deletions.
>
>
>                      Unfortunately the man page is not easy to access
>                 (you need to do
>                      ?`coverage,GAlignments-method`____), but it says:
>
>                          The methods for GAlignments and GAlignmentPairs
>                 objects do:
>
>                            coverage(grglist(x), ...)
>
>                      And if you do grglist() on a GAlignments or
>                 GAlignmentPairs
>                 objects, the
>                      ranges you get in the returned GRangesList object
>                 are calculated
>                 based
>                      on the CIGAR.
>
>                      Trust but verify. Here is how you can actually
>                 verify that
>                 coverage()
>                      does take the CIGAR into account:
>
>                         library(RNAseqData.HNRNPC.bam.____chr14)
>                         gal <-
>                 readGAlignments(RNAseqData.____HNRNPC.bam.chr14_BAMFILES[1])
>                         cig_op_table <- cigarOpTable(cigar(gal))
>
>                      First we pick up an alignment with an N in its
>                 CIGAR and do
>                 coverage()
>                      on it:
>
>                         > gal_with_N <- gal[which(cig_op_table[ , "N"]
>                 != 0)[1]]
>
>                         > gal_with_N
>                         GAlignments with 1 alignment and 0 metadata columns:
>                               seqnames strand       cigar    qwidth
>                 start       end
>                         width
>                                  <Rle>  <Rle> <character> <integer>
>                 <integer> <integer>
>                      <integer>
>                           [1]    chr14      + 55M2117N17M        72
>                   19650072  19652260
>                          2189
>                                    ngap
>                               <integer>
>                           [1]         1
>                           ---
>                           seqlengths:
>                                           chr1                 chr10 ...
>                        chrY
>                                      249250621             135534747 ...
>                        59373566
>
>                         > coverage(gal_with_N)$chr14
>                         integer-Rle of length 107349540 with 5 runs
>                           Lengths: 19650071       55     2117       17
>                 87697280
>                           Values :        0        1        0        1
>                       0
>
>                      Same thing with an alignment with an I in its CIGAR:
>
>                         > gal_with_I <- gal[which(cig_op_table[ , "I"]
>                 != 0)[1]]
>
>                         > gal_with_I
>                         GAlignments with 1 alignment and 0 metadata columns:
>                               seqnames strand       cigar    qwidth
>                 start       end
>                         width
>                                  <Rle>  <Rle> <character> <integer>
>                 <integer> <integer>
>                      <integer>
>                           [1]    chr14      -     64M1I7M        72
>                   19411677  19411747
>                            71
>                                    ngap
>                               <integer>
>                           [1]         0
>                           ---
>                           seqlengths:
>                                           chr1                 chr10 ...
>                        chrY
>                                      249250621             135534747 ...
>                        59373566
>
>                         > coverage(gal_with_I)$chr14
>                         integer-Rle of length 107349540 with 3 runs
>                           Lengths: 19411676 71 87937793 <tel:71%2087937793>
>                           Values :        0        1        0
>
>                      Same thing with an alignment with a D in its CIGAR:
>
>                         > gal_with_D <- gal[which(cig_op_table[ , "D"]
>                 != 0)[1]]
>
>                         > gal_with_D
>                         GAlignments with 1 alignment and 0 metadata columns:
>                               seqnames strand       cigar    qwidth
>                 start       end
>                         width
>                                  <Rle>  <Rle> <character> <integer>
>                 <integer> <integer>
>                      <integer>
>                           [1]    chr14      +    38M1D34M        72
>                   19659063  19659135
>                            73
>                                    ngap
>                               <integer>
>                           [1]         0
>                           ---
>                           seqlengths:
>                                           chr1                 chr10 ...
>                        chrY
>                                      249250621             135534747 ...
>                        59373566
>
>                         > coverage(gal_with_D)$chr14
>                         integer-Rle of length 107349540 with 3 runs
>                           Lengths: 19659062       73 87690405
>                           Values :        0        1        0
>
>                      Seeing is believing,
>
>                      Cheers,
>                      H.
>
>
>
>                              I started to look at the applyPileups() in
>                 Rsamtools which I
>                              can get to
>                              calculate coverage using cigars, but not
>                 using the strand or
>                              flag
>                              information for filtering. That solution
>                 would start from a
>                              bam-file
>                              instead of a GAlignments object, and sure I
>                 can do the
>                              filtering outside R.
>                              But it would be very convenient to do it
>                 all from within R.
>
>                              If there are nice solutions starting from
>                 both a GAlignments
>                              and a
>                              bam-file it would be great! =)
>
>                              /Jesper
>
>
>
>                              On Tue, Feb 18, 2014 at 10:52 PM, Michael
>                 Lawrence <
>                 lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
>                              <mailto:lawrence.michael at gene.__com
>                 <mailto:lawrence.michael at gene.com>>> wrote:
>
>                                  Hi Jesper,
>
>                                  Would you be willing to volunteer your
>                 use case? As
>                                  Herve hinted,
>                                  cigarToRleList and friends are
>                 low-level helpers. There
>                                  may be an easier
>                                  way to achieve what you want, or an
>                 opportunity to
>                                  improve things.
>
>                                  Michael
>
>
>                                  On Mon, Feb 17, 2014 at 1:10 AM, Jesper
>                 Gådin
>                                  <jesper.gadin at gmail.com
>                 <mailto:jesper.gadin at gmail.com>
>                                  <mailto:jesper.gadin at gmail.com
>                 <mailto:jesper.gadin at gmail.com>__>>wrote:
>
>                                      Hi,
>
>                                      Have come across a cigar-vector
>                 that is problematic
>                                      to process.
>
>                                      #load package
>
>                                          library(GenomicAlignments)
>
>
>                                      #load data (see attached file)
>
>
>                 load("2014-02-17-cigarExample.____rdata")
>
>
>                                      #run function cigarToRleList
>
>                                          cigarRle <-
>                 cigarToRleList(cigarExample)
>
>                                      Error in .Call2("Rle_constructor",
>                 values, lengths,
>                                      check, 0L, PACKAGE =
>                                      "IRanges") :
>                                          integer overflow while summing
>                 elements in
>                 'lengths'
>
>                                          sessionInfo()
>
>                                      R Under development (unstable)
>                 (2013-11-13 r64209)
>                                      Platform: x86_64-unknown-linux-gnu
>                 (64-bit)
>
>                                      locale:
>                                         [1] LC_CTYPE=en_US.UTF-8
>                 LC_NUMERIC=C
>                                         [3] LC_TIME=en_US.UTF-8
>                 LC_COLLATE=en_US.UTF-8
>                                         [5] LC_MONETARY=en_US.UTF-8
>                                        LC_MESSAGES=en_US.UTF-8
>                                         [7] LC_PAPER=en_US.UTF-8
>                 LC_NAME=C
>                                         [9] LC_ADDRESS=C
>                 LC_TELEPHONE=C
>                                      [11] LC_MEASUREMENT=en_US.UTF-8
>                 LC_IDENTIFICATION=C
>
>                                      attached base packages:
>                                      [1] parallel  stats     graphics
>                   grDevices utils
>                                         datasets  methods
>                                      [8] base
>
>                                      other attached packages:
>                                      [1] GenomicAlignments_0.99.18
>                 Rsamtools_1.15.26
>                                      [3] Biostrings_2.31.12
>                   XVector_0.3.6
>                                      [5] GenomicRanges_1.15.26
>                 IRanges_1.21.23
>                                      [7] BiocGenerics_0.9.3
>
>                                      loaded via a namespace (and not
>                 attached):
>                                         [1] BatchJobs_1.1-1135 BBmisc_1.5
>                                      BiocParallel_0.5.8
>                                      bitops_1.0-6
>
>                                         [5] brew_1.0-6
>                 codetools_0.2-8
>                 DBI_0.2-7
>                                         digest_0.6.4
>
>                                         [9] fail_1.2           foreach_1.4.1
>                                        iterators_1.0.6    plyr_1.8
>
>                                      [13] RSQLite_0.11.4     sendmailR_1.1-2
>                                        stats4_3.1.0       tools_3.1.0
>
>                                      [17] zlibbioc_1.9.0
>
>
>                 ___________________________________________________
>                 Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>                                      <mailto:Bioc-devel at r-project.__org
>                 <mailto:Bioc-devel at r-project.org>> mailing list
>                 https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
>                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>                 <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>
>
>                                   [[alternative HTML version deleted]]
>
>
>
>
>                          ___________________________________________________
>                 Bioc-devel at r-project.org
>                 <mailto:Bioc-devel at r-project.org>
>                 <mailto:Bioc-devel at r-project.__org
>                 <mailto:Bioc-devel at r-project.org>>
>                          mailing list
>                 https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
>                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>                 <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>                      --
>                      Hervé Pagès
>
>                      Program in Computational Biology
>                      Division of Public Health Sciences
>                      Fred Hutchinson Cancer Research Center
>                      1100 Fairview Ave. N, M1-B514
>                      P.O. Box 19024
>                      Seattle, WA 98109-1024
>
>                      E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                 <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                      Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>                 <tel:%28206%29%20667-5791>
>                      Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>                 <tel:%28206%29%20667-1319>
>
>
>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list