[BioC] shift function (GenomicRanges, IRanges), issue with negative start values and circular DNA
Ivanek, Robert
robert.ivanek at fmi.ch
Thu Sep 1 17:23:11 CEST 2011
Hi Michael,
is there any plan to fixed these issues?
Best Regards
Robert
On 09/01/2011 03:37 PM, Michael Lawrence wrote:
> Nevermind, I misunderstood what you were trying to do.
>
> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
>
> As a sort of side-comment, you might be better off doing this with
> resize(), i.e.,
>
> resize(gr, width(gr) - 50, fix = "end")
>
> Michael
>
> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at fmi.ch
> <mailto:robert.ivanek at fmi.ch>> wrote:
>
> Dear Bioc developers,
>
> I am analysing the ChIP-Seq data using the GenomicRanges
> package. I am usually shifting the reads to the middle of the
> fragment (i.e. by 50 bp toward 3' end of the fragment according
> the strand of the read), using the shift function (from
> IRanges), which works in most cases.
>
> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1)))
>
> However sometimes the result does not look like as I would expect:
>
> 1. If the read is on the minus strand and the start value is
> smaller than the shift value, it produces an error:
>
> ## Error in validObject(x) :
> ## invalid class “IRanges” object: 'widths(x)' cannot contain
> negative values
>
> If the read is on the plus strand at the end of the chromosome
> it returns the size of the chromosome as a new end value. Would
> be possible to apply similar strategy for the beginning of the
> chromosome (It would return 1 as a start value)?
>
>
>
> 2. The shift function does not take into account information
> about circularity of the chromosome. Is there any plan to
> include it?
>
>
> Please see below reproducible examples
>
> Best Regards
>
> Robert
>
> --
> Robert Ivanek
> Postdoctoral Fellow Schuebeler Group
> Friedrich Miescher Institute
> Maulbeerstrasse 66
> 4058 Basel / Switzerland
> Office phone: +41 61 697 6100 <tel:%2B41%2061%20697%206100>
>
>
>
>
> EXAMPLES:
>
>
> ##
> ##############################__##############################__#################
> ##
> library("GenomicRanges")
> ## Loading required package: IRanges
> ## Attaching package: ‘IRanges’
> ## The following object(s) are masked from ‘package:base’:
> ## cbind, eval, intersect, Map, mapply, order, paste, pmax,
> pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>,
> rbind, rep.int <http://rep.int>, setdiff, table, union
>
> ##
> ##
> ##############################__##############################__#################
> ##
> chr1.gr <http://chr1.gr> <- GRanges(seqnames=Rle("chr1", 4),
> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1),
> strand=rep(c("+","-"),2),
> seqlengths=c(chr1=197195432))
> ##
>
> isCircular(chr1.gr <http://chr1.gr>) <- FALSE
> ##
>
> chr1.gr <http://chr1.gr>
> ## GRanges with 4 ranges and 0 elementMetadata values:
> ## seqnames ranges strand
> ## <Rle> <IRanges> <Rle>
> ## [1] chr1 [ 1, 1] +
> ## [2] chr1 [ 1, 1] -
> ## [3] chr1 [197195432, 197195432] +
> ## [4] chr1 [197195432, 197195432] -
> ## ---
> ## seqlengths:
> ## chr1
> ## 197195432
> ##
>
> seqinfo(chr1.gr <http://chr1.gr>)
> ## Seqinfo of length 1
> ## seqnames seqlengths isCircular
> ## chr1 197195432 FALSE
> ##
>
> shift(chr1.gr <http://chr1.gr>, shift=50)
> ## GRanges with 4 ranges and 0 elementMetadata values:
> ## seqnames ranges strand
> ## <Rle> <IRanges> <Rle>
> ## [1] chr1 [ 51, 51] +
> ## [2] chr1 [ 51, 51] -
> ## [3] chr1 [197195432, 197195432] +
> ## [4] chr1 [197195432, 197195432] -
> ## ---
> ## seqlengths:
> ## chr1
> ## 197195432
> ## Warning message:
> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
> 197195482L, :
> ## trimmed end values to be <= seqlengths
> ##
>
> shift(chr1.gr <http://chr1.gr>, shift=50 *
> as.integer(ifelse(strand(chr1.__gr <http://chr1.gr>)=="-", -1, 1)))
> ## Error in validObject(x) :
> ## invalid class “IRanges” object: 'widths(x)' cannot contain
> negative values
> ## In addition: Warning messages:
> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
> 197195482L, :
> ## trimmed end values to be <= seqlengths
> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L,
> 197195382L :
> ## trimmed start values to be positive
> ##
> ##
> ##############################__##############################__#################
> ##
> chrM.gr <- GRanges(seqnames=Rle("chrM", 4),
> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1),
> strand=rep(c("+","-"),2),
> seqlengths=c(chrM=16299))
> ##
>
> isCircular(chrM.gr) <- TRUE
> ##
>
> chrM.gr
> ## GRanges with 4 ranges and 0 elementMetadata values:
> ## seqnames ranges strand
> ## <Rle> <IRanges> <Rle>
> ## [1] chrM [ 1, 1] +
> ## [2] chrM [ 1, 1] -
> ## [3] chrM [16299, 16299] +
> ## [4] chrM [16299, 16299] -
> ## ---
> ## seqlengths:
> ## chrM
> ## 16299
> ##
>
> seqinfo(chrM.gr)
> ## Seqinfo of length 1
> ## seqnames seqlengths isCircular
> ## chrM 16299 TRUE
> ##
>
> shift(chrM.gr, shift=50)
> ## GRanges with 4 ranges and 0 elementMetadata values:
> ## seqnames ranges strand
> ## <Rle> <IRanges> <Rle>
> ## [1] chrM [ 51, 51] +
> ## [2] chrM [ 51, 51] -
> ## [3] chrM [16299, 16299] +
> ## [4] chrM [16299, 16299] -
> ## ---
> ## seqlengths:
> ## chrM
> ## 16299
> ## Warning message:
> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
> 16349L, 16349L :
> ## trimmed end values to be <= seqlengths
> ##
>
> shift(chrM.gr, shift=50 *
> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1)))
> ## Error in validObject(x) :
> ## invalid class “IRanges” object: 'widths(x)' cannot contain
> negative values
> ## In addition: Warning messages:
> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
> 16349L, :
> ## trimmed end values to be <= seqlengths
> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) :
> ## trimmed start values to be positive
> ##
> ##
> ##
> ##############################__##############################__#################
> ##
> sessionInfo()
> ## R Under development (unstable) (2011-08-29 r56824)
> ## Platform: x86_64-unknown-linux-gnu (64-bit)
>
> ## locale:
> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C
> LC_TELEPHONE=C
> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> ## attached base packages:
> ## [1] stats graphics grDevices utils datasets methods base
>
> ## other attached packages:
> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26
> ##
> ##
> ##############################__##############################__#################
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
--
Robert Ivanek
Postdoctoral Fellow Schuebeler Group
Friedrich Miescher Institute
Maulbeerstrasse 66
4058 Basel / Switzerland
Office phone: +41 61 697 6100
More information about the Bioconductor
mailing list