[BioC] finding neigbouring intervals for 2 GRange objects
Valerie Obenchain
vobencha at fhcrc.org
Mon Mar 4 18:17:42 CET 2013
Hi Hermann, Harris,
I wanted to give some background on the behavior of these functions for
GRanges vs IRanges.
> gr <- GRanges("chr1", IRanges(c(1,2,3), width=1))
> gr
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [1, 1] *
[2] chr1 [2, 2] *
[3] chr1 [3, 3] *
-----------------------------
"+" strand behavior:
-----------------------------
IRanges objects have no strand so you can expect them to behave as "+"
strand GRanges do. If this is ever not the case please let me know. That
would be a bug.
> strand(gr) <- "+"
> precede(gr)
[1] 2 3 NA
> precede(ranges(gr))
[1] 2 3 NA
-----------------------------
"-" strand behavior:
-----------------------------
I think it makes intuitive sense that precede() and follow() on "-"
strand GRanges will give the opposite results from precede() and
follow() on "+" strand GRanges.
> strand(gr) <- "-"
> precede(gr)
[1] NA 1 2
-----------------------------
"*" strand behavior:
-----------------------------
For "*" strand GRanges maybe we tried to be too clever. When
ignore.strand=TRUE strand is treated as "+":
> strand(gr) <- "*"
> precede(gr, ignore.strand=TRUE)
[1] 2 3 NA
When ignore.strand=FALSE for a "*" strand range, results are computed
for both "+' and "-". If one result is NA, the other is taken. This is
how we end up with a '2' in positions 1 and 3 (position 3 was NA for "+"
and position 1 was NA for "-"). If there is a tie, the first in order is
taken. This is how we get the '1' in position 2 (tie was between '1' and
'3' and '1' comes before '3' in order).
> precede(gr, ignore.strand=FALSE)
[1] 2 1 2
It was a challenge to decide how "*" strand with ignore.strand=FALSE
should behave. If others have opinions on this I'd be interested in
hearing them.
Valerie
On 03/03/13 16:38, Harris A. Jaffee wrote:
> You want precede() and follow(), and then distance(), but tread somewhat carefully with strands of "*":
>
>> x=IRanges(2, 2)
>> x
> IRanges of length 1
> start end width
> [1] 2 2 1
>
>> y=IRanges(c(1,3), c(1,3))
>> y
> IRanges of length 2
> start end width
> [1] 1 1 1
> [2] 3 3 1
>
>> precede(x, y)
> [1] 2
>> follow(x, y)
> [1] 1
>
>> X = GRanges(ranges=x, seqnames="chr1")
>> Y = GRanges(ranges=y, seqnames="chr1")
>> X
> GRanges with 1 range and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr1 [2, 2] *
> ---
> seqlengths:
> chr1
> NA
>> Y
> GRanges with 2 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr1 [1, 1] *
> [2] chr1 [3, 3] *
> ---
> seqlengths:
> chr1
> NA
>
>> precede(X, Y)
> [1] 1
>> follow(X, Y)
> [1] 1
>
>> precede(X, Y, ignore.strand=TRUE)
> [1] 2
>> follow(X, Y, ignore.strand=TRUE)
> [1] 1
>
>
> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote:
>
>> Hello,
>>
>> I have to grange-objects - gr1 and gr2. And I wish to know:
>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and
>> downstream?
>> 2) What is the distance in bp between the neighbouring intervals?
>>
>> Could anybody give me a hint how this is done?
>> Thanks
>> Hermann
>>
>>> dput (gr1)
>> new("GRanges"
>> , seqnames = new("Rle"
>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>> "chr13",
>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>> , lengths = 4L
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , ranges = new("IRanges"
>> , start = c(540823L, 752809L, 771015L, 773361L)
>> , width = c(221L, 230L, 520L, 247L)
>> , NAMES = NULL
>> , elementType = "integer"
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , strand = new("Rle"
>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>> , lengths = 4L
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , elementMetadata = new("DataFrame"
>> , rownames = NULL
>> , nrows = 4L
>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak
>> = c(111L,
>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity",
>> "epeak", "over"))
>> , elementType = "ANY"
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , seqinfo = new("Seqinfo"
>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>> "chr15",
>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>> "chrX", "chrY")
>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>> , genome = c(NA_character_, NA_character_, NA_character_,
>> NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>> )
>> )
>> , metadata = list()
>> )
>>> dput (gr1)
>> new("GRanges"
>> , seqnames = new("Rle"
>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>> "chr13",
>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>> , lengths = 4L
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , ranges = new("IRanges"
>> , start = c(540823L, 752809L, 771015L, 773361L)
>> , width = c(221L, 230L, 520L, 247L)
>> , NAMES = NULL
>> , elementType = "integer"
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , strand = new("Rle"
>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>> , lengths = 4L
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , elementMetadata = new("DataFrame"
>> , rownames = NULL
>> , nrows = 4L
>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak
>> = c(111L,
>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity",
>> "epeak", "over"))
>> , elementType = "ANY"
>> , elementMetadata = NULL
>> , metadata = list()
>> )
>> , seqinfo = new("Seqinfo"
>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>> "chr15",
>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>> "chrX", "chrY")
>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>> , genome = c(NA_character_, NA_character_, NA_character_,
>> NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>> )
>> )
>> , metadata = list()
>> )
>>
>>> gr1
>> GRanges with 4 ranges and 3 metadata columns:
>> seqnames ranges strand | edensity epeak over
>> <Rle> <IRanges> <Rle> | <integer> <integer> <integer>
>> [1] chr1 [540823, 541043] * | 589 111 0
>> [2] chr1 [752809, 753038] * | 574 89 1
>> [3] chr1 [771015, 771534] * | 578 234 1
>> [4] chr1 [773361, 773607] * | 571 136 1
>> ---
>> seqlengths:
>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX
>> chrY
>> NA NA NA NA NA NA ... NA NA NA NA NA
>> NA
>>> gr2
>> GRanges with 3 ranges and 3 metadata columns:
>> seqnames ranges strand | edensity epeak over
>> <Rle> <IRanges> <Rle> | <integer> <integer> <integer>
>> [1] chr1 [ 53141, 53685] * | 601 212 0
>> [2] chr1 [521536, 521622] * | 568 37 1
>> [3] chr1 [805002, 805650] * | 1000 326 1
>> ---
>> seqlengths:
>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX
>> chrY
>> NA NA NA NA NA NA ... NA NA NA NA NA
>> NA
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list