[BioC] shift function (GenomicRanges, IRanges), issue with negative start values and circular DNA

Hervé Pagès hpages at fhcrc.org
Mon Jan 23 01:04:17 CET 2012


Hi Robert,

Sorry that this completely went off my radar but this week someone
kindly sent me a reminder.

So after re-reading your email I agree that shift() should behave
consistently on both ends of the chromosome but I don't like the idea
that it automatically trims the ranges for you. I think we should
shift without trimming, but with a warning that some ranges are going
off limits.
Then it would be very easy for the user to trim those ranges if
s/he wanted, using trim() (there is currently no "trim" method for
GRanges objects but I'll add one). Note that for ranges that are
completely off limits (e.g. start=-20, end=-4), it's not clear
what trim() should do: (a) fail, (b) produce a 0-width range located
on the edge (e.g. start=1, end=0), with or (c) without warning.
This is one of the reasons I think this task is better done separately.
The other reason to not do the trimming in shift() is to maintain
the nice property that shift(shift(x, N), -N) should always be a no-op.

When the chromosome is circular, I think we should do the same, except
that no warning should be issued when a range goes off limits.

Does that sound reasonable?

Cheers,
H.


On 09/01/2011 08:23 AM, Ivanek, Robert wrote:
> Hi Michael,
>
> is there any plan to fixed these issues?
>
> Best Regards
>
> Robert
>
> On 09/01/2011 03:37 PM, Michael Lawrence wrote:
>> Nevermind, I misunderstood what you were trying to do.
>>
>> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at gene.com
>> <mailto:michafla at gene.com>> wrote:
>>
>> As a sort of side-comment, you might be better off doing this with
>> resize(), i.e.,
>>
>> resize(gr, width(gr) - 50, fix = "end")
>>
>> Michael
>>
>> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at fmi.ch
>> <mailto:robert.ivanek at fmi.ch>> wrote:
>>
>> Dear Bioc developers,
>>
>> I am analysing the ChIP-Seq data using the GenomicRanges
>> package. I am usually shifting the reads to the middle of the
>> fragment (i.e. by 50 bp toward 3' end of the fragment according
>> the strand of the read), using the shift function (from
>> IRanges), which works in most cases.
>>
>> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1)))
>>
>> However sometimes the result does not look like as I would expect:
>>
>> 1. If the read is on the minus strand and the start value is
>> smaller than the shift value, it produces an error:
>>
>> ## Error in validObject(x) :
>> ## invalid class “IRanges” object: 'widths(x)' cannot contain
>> negative values
>>
>> If the read is on the plus strand at the end of the chromosome
>> it returns the size of the chromosome as a new end value. Would
>> be possible to apply similar strategy for the beginning of the
>> chromosome (It would return 1 as a start value)?
>>
>>
>>
>> 2. The shift function does not take into account information
>> about circularity of the chromosome. Is there any plan to
>> include it?
>>
>>
>> Please see below reproducible examples
>>
>> Best Regards
>>
>> Robert
>>
>> --
>> Robert Ivanek
>> Postdoctoral Fellow Schuebeler Group
>> Friedrich Miescher Institute
>> Maulbeerstrasse 66
>> 4058 Basel / Switzerland
>> Office phone: +41 61 697 6100 <tel:%2B41%2061%20697%206100>
>>
>>
>>
>>
>> EXAMPLES:
>>
>>
>> ##
>> ##############################__##############################__#################
>>
>> ##
>> library("GenomicRanges")
>> ## Loading required package: IRanges
>> ## Attaching package: ‘IRanges’
>> ## The following object(s) are masked from ‘package:base’:
>> ## cbind, eval, intersect, Map, mapply, order, paste, pmax,
>> pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>,
>> rbind, rep.int <http://rep.int>, setdiff, table, union
>>
>> ##
>> ##
>> ##############################__##############################__#################
>>
>> ##
>> chr1.gr <http://chr1.gr> <- GRanges(seqnames=Rle("chr1", 4),
>> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1),
>> strand=rep(c("+","-"),2),
>> seqlengths=c(chr1=197195432))
>> ##
>>
>> isCircular(chr1.gr <http://chr1.gr>) <- FALSE
>> ##
>>
>> chr1.gr <http://chr1.gr>
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <Rle> <IRanges> <Rle>
>> ## [1] chr1 [ 1, 1] +
>> ## [2] chr1 [ 1, 1] -
>> ## [3] chr1 [197195432, 197195432] +
>> ## [4] chr1 [197195432, 197195432] -
>> ## ---
>> ## seqlengths:
>> ## chr1
>> ## 197195432
>> ##
>>
>> seqinfo(chr1.gr <http://chr1.gr>)
>> ## Seqinfo of length 1
>> ## seqnames seqlengths isCircular
>> ## chr1 197195432 FALSE
>> ##
>>
>> shift(chr1.gr <http://chr1.gr>, shift=50)
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <Rle> <IRanges> <Rle>
>> ## [1] chr1 [ 51, 51] +
>> ## [2] chr1 [ 51, 51] -
>> ## [3] chr1 [197195432, 197195432] +
>> ## [4] chr1 [197195432, 197195432] -
>> ## ---
>> ## seqlengths:
>> ## chr1
>> ## 197195432
>> ## Warning message:
>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
>> 197195482L, :
>> ## trimmed end values to be <= seqlengths
>> ##
>>
>> shift(chr1.gr <http://chr1.gr>, shift=50 *
>> as.integer(ifelse(strand(chr1.__gr <http://chr1.gr>)=="-", -1, 1)))
>> ## Error in validObject(x) :
>> ## invalid class “IRanges” object: 'widths(x)' cannot contain
>> negative values
>> ## In addition: Warning messages:
>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
>> 197195482L, :
>> ## trimmed end values to be <= seqlengths
>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L,
>> 197195382L :
>> ## trimmed start values to be positive
>> ##
>> ##
>> ##############################__##############################__#################
>>
>> ##
>> chrM.gr <- GRanges(seqnames=Rle("chrM", 4),
>> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1),
>> strand=rep(c("+","-"),2),
>> seqlengths=c(chrM=16299))
>> ##
>>
>> isCircular(chrM.gr) <- TRUE
>> ##
>>
>> chrM.gr
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <Rle> <IRanges> <Rle>
>> ## [1] chrM [ 1, 1] +
>> ## [2] chrM [ 1, 1] -
>> ## [3] chrM [16299, 16299] +
>> ## [4] chrM [16299, 16299] -
>> ## ---
>> ## seqlengths:
>> ## chrM
>> ## 16299
>> ##
>>
>> seqinfo(chrM.gr)
>> ## Seqinfo of length 1
>> ## seqnames seqlengths isCircular
>> ## chrM 16299 TRUE
>> ##
>>
>> shift(chrM.gr, shift=50)
>> ## GRanges with 4 ranges and 0 elementMetadata values:
>> ## seqnames ranges strand
>> ## <Rle> <IRanges> <Rle>
>> ## [1] chrM [ 51, 51] +
>> ## [2] chrM [ 51, 51] -
>> ## [3] chrM [16299, 16299] +
>> ## [4] chrM [16299, 16299] -
>> ## ---
>> ## seqlengths:
>> ## chrM
>> ## 16299
>> ## Warning message:
>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
>> 16349L, 16349L :
>> ## trimmed end values to be <= seqlengths
>> ##
>>
>> shift(chrM.gr, shift=50 *
>> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1)))
>> ## Error in validObject(x) :
>> ## invalid class “IRanges” object: 'widths(x)' cannot contain
>> negative values
>> ## In addition: Warning messages:
>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
>> 16349L, :
>> ## trimmed end values to be <= seqlengths
>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) :
>> ## trimmed start values to be positive
>> ##
>> ##
>> ##
>> ##############################__##############################__#################
>>
>> ##
>> sessionInfo()
>> ## R Under development (unstable) (2011-08-29 r56824)
>> ## Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> ## locale:
>> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
>> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C
>> LC_TELEPHONE=C
>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> ## attached base packages:
>> ## [1] stats graphics grDevices utils datasets methods base
>>
>> ## other attached packages:
>> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26
>> ##
>> ##
>> ##############################__##############################__#################
>>
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list