[Bioc-devel] cigarToRleList fails
Martin Morgan
mtmorgan at fhcrc.org
Thu Feb 20 23:55:59 CET 2014
On 02/20/2014 02:32 PM, Hervé Pagès wrote:
> Hi Jesper,
>
> On 02/20/2014 02:13 PM, Jesper Gådin wrote:
>> Very true that it is quite difficult to find the documentation when one
>> is not aware of its existence :P
>
> Yeah, this has been a source of frustration for many people. And
> always a source of embarrassment (for us) when teaching our software
> to beginners.
>
> I've started to change this. In the upcoming version of BioC (2.14,
> scheduled for mid-April), when you'll do ?coverage, you'll get to
> choose between the 3 man pages that document coverage methods (there
> is one in IRanges, one in GenomicRanges, and one in GenomicAlignments).
>
> I want to generalize this to other generics that have methods spread
> across several packages (e.g. findOverlaps, the intra- and inter-range
> methods, etc...).
>
> Having to choose between several man pages every time you do e.g.
> ?findOverlaps is a minor annoyance compared to not being able to
> find the man page at all. (Note that if you already know where is
> your favorite man page, you'll be able to direct access it with
> e.g. ?GenomicRanges::findOverlaps). Nobody will ever need to use
> the insane ?`findOverlaps,GenomicRanges,GenomicRanges-method` to
tab completion helps, so that you don't need to be totally insane, just insane
enough to know to start with
?"cover<tab>
Martin
> open that man page again. Ever! (it will still work though...)
>
> Cheers,
> H.
>
>>
>> coverage() is fast and beautiful. Thanks!
>>
>> /Jesper
>>
>>
>> On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès <hpages at fhcrc.org
>> <mailto:hpages at fhcrc.org>> wrote:
>>
>> Hi Jesper,
>>
>>
>> On 02/19/2014 08:44 AM, Michael Lawrence wrote:
>>
>> On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin
>> <jesper.gadin at gmail.com <mailto:jesper.gadin at gmail.com>>wrote:
>>
>> Hi Michael,
>>
>> Herves suggestion will probably work for my use case, but if
>> there are any
>> smoother ways it would be preferable.
>>
>> The use case is as follows:
>>
>> 1) calculate strand specific coverage over a region from
>> GAlignments object (or file)
>>
>> At the moment I read a file using readGAlignmentsFromBam()
>> with tag="XS",
>> then filter it on "flag" and "mapq". Then I subset the
>> resulting GAlignments in a minus and a plus -strand object.
>> Then I calculate coverage by my own coverage function which
>> uses the cigar
>> information in the GAlignments object. This function is the
>> one using
>> cigarToRleList() at the moment. As I understand the
>> coverage() function
>> from the GenomicAlignments/IRanges package does not take
>> into account
>> cigars, or does it?
>>
>>
>> It does take the coverage into account; specifically to exclude
>> the introns
>> from coverage. I think there's also an option to exclude deletions.
>>
>>
>> Unfortunately the man page is not easy to access (you need to do
>> ?`coverage,GAlignments-method`__), but it says:
>>
>> The methods for GAlignments and GAlignmentPairs objects do:
>>
>> coverage(grglist(x), ...)
>>
>> And if you do grglist() on a GAlignments or GAlignmentPairs objects, the
>> ranges you get in the returned GRangesList object are calculated based
>> on the CIGAR.
>>
>> Trust but verify. Here is how you can actually verify that coverage()
>> does take the CIGAR into account:
>>
>> library(RNAseqData.HNRNPC.bam.__chr14)
>> gal <- readGAlignments(RNAseqData.__HNRNPC.bam.chr14_BAMFILES[1])
>> cig_op_table <- cigarOpTable(cigar(gal))
>>
>> First we pick up an alignment with an N in its CIGAR and do coverage()
>> on it:
>>
>> > gal_with_N <- gal[which(cig_op_table[ , "N"] != 0)[1]]
>>
>> > gal_with_N
>> GAlignments with 1 alignment and 0 metadata columns:
>> seqnames strand cigar qwidth start end
>> width
>> <Rle> <Rle> <character> <integer> <integer> <integer>
>> <integer>
>> [1] chr14 + 55M2117N17M 72 19650072 19652260
>> 2189
>> ngap
>> <integer>
>> [1] 1
>> ---
>> seqlengths:
>> chr1 chr10 ...
>> chrY
>> 249250621 135534747 ...
>> 59373566
>>
>> > coverage(gal_with_N)$chr14
>> integer-Rle of length 107349540 with 5 runs
>> Lengths: 19650071 55 2117 17 87697280
>> Values : 0 1 0 1 0
>>
>> Same thing with an alignment with an I in its CIGAR:
>>
>> > gal_with_I <- gal[which(cig_op_table[ , "I"] != 0)[1]]
>>
>> > gal_with_I
>> GAlignments with 1 alignment and 0 metadata columns:
>> seqnames strand cigar qwidth start end
>> width
>> <Rle> <Rle> <character> <integer> <integer> <integer>
>> <integer>
>> [1] chr14 - 64M1I7M 72 19411677 19411747
>> 71
>> ngap
>> <integer>
>> [1] 0
>> ---
>> seqlengths:
>> chr1 chr10 ...
>> chrY
>> 249250621 135534747 ...
>> 59373566
>>
>> > coverage(gal_with_I)$chr14
>> integer-Rle of length 107349540 with 3 runs
>> Lengths: 19411676 71 87937793 <tel:71%2087937793>
>> Values : 0 1 0
>>
>> Same thing with an alignment with a D in its CIGAR:
>>
>> > gal_with_D <- gal[which(cig_op_table[ , "D"] != 0)[1]]
>>
>> > gal_with_D
>> GAlignments with 1 alignment and 0 metadata columns:
>> seqnames strand cigar qwidth start end
>> width
>> <Rle> <Rle> <character> <integer> <integer> <integer>
>> <integer>
>> [1] chr14 + 38M1D34M 72 19659063 19659135
>> 73
>> ngap
>> <integer>
>> [1] 0
>> ---
>> seqlengths:
>> chr1 chr10 ...
>> chrY
>> 249250621 135534747 ...
>> 59373566
>>
>> > coverage(gal_with_D)$chr14
>> integer-Rle of length 107349540 with 3 runs
>> Lengths: 19659062 73 87690405
>> Values : 0 1 0
>>
>> Seeing is believing,
>>
>> Cheers,
>> H.
>>
>>
>>
>> I started to look at the applyPileups() in Rsamtools which I
>> can get to
>> calculate coverage using cigars, but not using the strand or
>> flag
>> information for filtering. That solution would start from a
>> bam-file
>> instead of a GAlignments object, and sure I can do the
>> filtering outside R.
>> But it would be very convenient to do it all from within R.
>>
>> If there are nice solutions starting from both a GAlignments
>> and a
>> bam-file it would be great! =)
>>
>> /Jesper
>>
>>
>>
>> On Tue, Feb 18, 2014 at 10:52 PM, Michael Lawrence <
>> lawrence.michael at gene.com
>> <mailto:lawrence.michael at gene.com>> wrote:
>>
>> Hi Jesper,
>>
>> Would you be willing to volunteer your use case? As
>> Herve hinted,
>> cigarToRleList and friends are low-level helpers. There
>> may be an easier
>> way to achieve what you want, or an opportunity to
>> improve things.
>>
>> Michael
>>
>>
>> On Mon, Feb 17, 2014 at 1:10 AM, Jesper Gådin
>> <jesper.gadin at gmail.com
>> <mailto:jesper.gadin at gmail.com>>wrote:
>>
>> Hi,
>>
>> Have come across a cigar-vector that is problematic
>> to process.
>>
>> #load package
>>
>> library(GenomicAlignments)
>>
>>
>> #load data (see attached file)
>>
>> load("2014-02-17-cigarExample.__rdata")
>>
>>
>> #run function cigarToRleList
>>
>> cigarRle <- cigarToRleList(cigarExample)
>>
>> Error in .Call2("Rle_constructor", values, lengths,
>> check, 0L, PACKAGE =
>> "IRanges") :
>> integer overflow while summing elements in 'lengths'
>>
>> sessionInfo()
>>
>> R Under development (unstable) (2013-11-13 r64209)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8
>> LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils
>> datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.18 Rsamtools_1.15.26
>> [3] Biostrings_2.31.12 XVector_0.3.6
>> [5] GenomicRanges_1.15.26 IRanges_1.21.23
>> [7] BiocGenerics_0.9.3
>>
>> loaded via a namespace (and not attached):
>> [1] BatchJobs_1.1-1135 BBmisc_1.5
>> BiocParallel_0.5.8
>> bitops_1.0-6
>>
>> [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7
>> digest_0.6.4
>>
>> [9] fail_1.2 foreach_1.4.1
>> iterators_1.0.6 plyr_1.8
>>
>> [13] RSQLite_0.11.4 sendmailR_1.1-2
>> stats4_3.1.0 tools_3.1.0
>>
>> [17] zlibbioc_1.9.0
>>
>> _________________________________________________
>> Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> _________________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> mailing list
>> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list