[BioC] shift function (GenomicRanges, IRanges), issue with negative start values and circular DNA
Ivanek, Robert
robert.ivanek at fmi.ch
Thu Sep 1 14:27:24 CEST 2011
Dear Bioc developers,
I am analysing the ChIP-Seq data using the GenomicRanges package. I am
usually shifting the reads to the middle of the fragment (i.e. by 50 bp
toward 3' end of the fragment according the strand of the read), using
the shift function (from IRanges), which works in most cases.
shift(gr, shift=50 * as.integer(ifelse(strand(gr)=="-", -1, 1)))
However sometimes the result does not look like as I would expect:
1. If the read is on the minus strand and the start value is smaller
than the shift value, it produces an error:
## Error in validObject(x) :
## invalid class “IRanges” object: 'widths(x)' cannot contain negative
values
If the read is on the plus strand at the end of the chromosome it
returns the size of the chromosome as a new end value. Would be possible
to apply similar strategy for the beginning of the chromosome (It would
return 1 as a start value)?
2. The shift function does not take into account information about
circularity of the chromosome. Is there any plan to include it?
Please see below reproducible examples
Best Regards
Robert
--
Robert Ivanek
Postdoctoral Fellow Schuebeler Group
Friedrich Miescher Institute
Maulbeerstrasse 66
4058 Basel / Switzerland
Office phone: +41 61 697 6100
EXAMPLES:
##
#############################################################################
##
library("GenomicRanges")
## Loading required package: IRanges
## Attaching package: ‘IRanges’
## The following object(s) are masked from ‘package:base’:
## cbind, eval, intersect, Map, mapply, order, paste, pmax,
pmax.int, pmin, pmin.int, rbind, rep.int, setdiff, table, union
##
##
#############################################################################
##
chr1.gr <- GRanges(seqnames=Rle("chr1", 4),
ranges=IRanges(start=rep(c(1,197195432),each=2), width=1),
strand=rep(c("+","-"),2),
seqlengths=c(chr1=197195432))
##
isCircular(chr1.gr) <- FALSE
##
chr1.gr
## GRanges with 4 ranges and 0 elementMetadata values:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 [ 1, 1] +
## [2] chr1 [ 1, 1] -
## [3] chr1 [197195432, 197195432] +
## [4] chr1 [197195432, 197195432] -
## ---
## seqlengths:
## chr1
## 197195432
##
seqinfo(chr1.gr)
## Seqinfo of length 1
## seqnames seqlengths isCircular
## chr1 197195432 FALSE
##
shift(chr1.gr, shift=50)
## GRanges with 4 ranges and 0 elementMetadata values:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 [ 51, 51] +
## [2] chr1 [ 51, 51] -
## [3] chr1 [197195432, 197195432] +
## [4] chr1 [197195432, 197195432] -
## ---
## seqlengths:
## chr1
## 197195432
## Warning message:
## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, 197195482L, :
## trimmed end values to be <= seqlengths
##
shift(chr1.gr, shift=50 * as.integer(ifelse(strand(chr1.gr)=="-", -1, 1)))
## Error in validObject(x) :
## invalid class “IRanges” object: 'widths(x)' cannot contain negative
values
## In addition: Warning messages:
## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, 197195482L, :
## trimmed end values to be <= seqlengths
## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L, 197195382L :
## trimmed start values to be positive
##
##
#############################################################################
##
chrM.gr <- GRanges(seqnames=Rle("chrM", 4),
ranges=IRanges(start=rep(c(1,16299),each=2), width=1),
strand=rep(c("+","-"),2),
seqlengths=c(chrM=16299))
##
isCircular(chrM.gr) <- TRUE
##
chrM.gr
## GRanges with 4 ranges and 0 elementMetadata values:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chrM [ 1, 1] +
## [2] chrM [ 1, 1] -
## [3] chrM [16299, 16299] +
## [4] chrM [16299, 16299] -
## ---
## seqlengths:
## chrM
## 16299
##
seqinfo(chrM.gr)
## Seqinfo of length 1
## seqnames seqlengths isCircular
## chrM 16299 TRUE
##
shift(chrM.gr, shift=50)
## GRanges with 4 ranges and 0 elementMetadata values:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chrM [ 51, 51] +
## [2] chrM [ 51, 51] -
## [3] chrM [16299, 16299] +
## [4] chrM [16299, 16299] -
## ---
## seqlengths:
## chrM
## 16299
## Warning message:
## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, 16349L, 16349L :
## trimmed end values to be <= seqlengths
##
shift(chrM.gr, shift=50 * as.integer(ifelse(strand(chrM.gr)=="-", -1, 1)))
## Error in validObject(x) :
## invalid class “IRanges” object: 'widths(x)' cannot contain negative
values
## In addition: Warning messages:
## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, 16349L, :
## trimmed end values to be <= seqlengths
## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) :
## trimmed start values to be positive
##
##
##
#############################################################################
##
sessionInfo()
## R Under development (unstable) (2011-08-29 r56824)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
## other attached packages:
## [1] GenomicRanges_1.5.31 IRanges_1.11.26
##
##
#############################################################################
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: shift.output.R
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20110901/764b0963/attachment.pl>
More information about the Bioconductor
mailing list