[BioC] finding neigbouring intervals for 2 GRange objects

Mon Mar 4 18:17:42 CET 2013

Hi Hermann, Harris,

I wanted to give some background on the behavior of these functions for 
GRanges vs IRanges.

 > gr <- GRanges("chr1", IRanges(c(1,2,3), width=1))
 > gr
GRanges with 3 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr1    [1, 1]      *
   [2]     chr1    [2, 2]      *
   [3]     chr1    [3, 3]      *

-----------------------------
"+" strand behavior:
-----------------------------
IRanges objects have no strand so you can expect them to behave as "+" 
strand GRanges do. If this is ever not the case please let me know. That 
would be a bug.

 > strand(gr) <- "+"
 > precede(gr)
[1]  2  3 NA
 > precede(ranges(gr))
[1]  2  3 NA

-----------------------------
"-" strand behavior:
-----------------------------
I think it makes intuitive sense that precede() and follow() on "-" 
strand GRanges will give the opposite results from precede() and 
follow() on "+" strand GRanges.

 > strand(gr) <- "-"
 > precede(gr)
[1] NA  1  2

-----------------------------
"*" strand behavior:
-----------------------------
For "*" strand GRanges maybe we tried to be too clever.  When 
ignore.strand=TRUE strand is treated as "+":
 > strand(gr) <- "*"
 > precede(gr, ignore.strand=TRUE)
[1]  2  3 NA

When ignore.strand=FALSE for a "*" strand range, results are computed 
for both "+' and "-". If one result is NA, the other is taken. This is 
how we end up with a '2' in positions 1 and 3 (position 3 was NA for "+" 
and position 1 was NA for "-"). If there is a tie, the first in order is 
taken. This is how we get the '1' in position 2 (tie was between '1' and 
'3' and '1' comes before '3' in order).
 > precede(gr, ignore.strand=FALSE)
[1] 2 1 2

It was a challenge to decide how "*" strand with ignore.strand=FALSE 
should behave. If others have opinions on this I'd be interested in 
hearing them.

Valerie

On 03/03/13 16:38, Harris A. Jaffee wrote:
> You want precede() and follow(), and then distance(), but tread somewhat carefully with strands of "*":
>
>> x=IRanges(2, 2)
>> x
> IRanges of length 1
>      start end width
> [1]     2   2     1
>
>> y=IRanges(c(1,3), c(1,3))
>> y
> IRanges of length 2
>      start end width
> [1]     1   1     1
> [2]     3   3     1
>
>> precede(x, y)
> [1] 2
>> follow(x, y)
> [1] 1
>
>> X = GRanges(ranges=x, seqnames="chr1")
>> Y = GRanges(ranges=y, seqnames="chr1")
>> X
> GRanges with 1 range and 0 metadata columns:
>        seqnames    ranges strand
>           <Rle> <IRanges>  <Rle>
>    [1]     chr1    [2, 2]      *
>    ---
>    seqlengths:
>     chr1
>       NA
>> Y
> GRanges with 2 ranges and 0 metadata columns:
>        seqnames    ranges strand
>           <Rle> <IRanges>  <Rle>
>    [1]     chr1    [1, 1]      *
>    [2]     chr1    [3, 3]      *
>    ---
>    seqlengths:
>     chr1
>       NA
>
>> precede(X, Y)
> [1] 1
>> follow(X, Y)
> [1] 1
>
>> precede(X, Y, ignore.strand=TRUE)
> [1] 2
>> follow(X, Y, ignore.strand=TRUE)
> [1] 1
>
>
> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote:
>
>> Hello,
>>
>> I have to grange-objects - gr1 and gr2. And I wish to know:
>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and
>> downstream?
>> 2) What is the distance in bp between the neighbouring intervals?
>>
>> Could anybody give me a hint how this is done?
>> Thanks
>> Hermann
>>
>>> dput (gr1)
>> new("GRanges"
>>     , seqnames = new("Rle"
>>     , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>> "chr13",
>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>>     , lengths = 4L
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , ranges = new("IRanges"
>>     , start = c(540823L, 752809L, 771015L, 773361L)
>>     , width = c(221L, 230L, 520L, 247L)
>>     , NAMES = NULL
>>     , elementType = "integer"
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , strand = new("Rle"
>>     , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>>     , lengths = 4L
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , elementMetadata = new("DataFrame"
>>     , rownames = NULL
>>     , nrows = 4L
>>     , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak
>> = c(111L,
>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity",
>> "epeak", "over"))
>>     , elementType = "ANY"
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , seqinfo = new("Seqinfo"
>>     , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>> "chr15",
>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>> "chrX", "chrY")
>>     , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>>     , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>>     , genome = c(NA_character_, NA_character_, NA_character_,
>> NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>> )
>> )
>>     , metadata = list()
>> )
>>> dput (gr1)
>> new("GRanges"
>>     , seqnames = new("Rle"
>>     , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>> "chr13",
>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>>     , lengths = 4L
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , ranges = new("IRanges"
>>     , start = c(540823L, 752809L, 771015L, 773361L)
>>     , width = c(221L, 230L, 520L, 247L)
>>     , NAMES = NULL
>>     , elementType = "integer"
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , strand = new("Rle"
>>     , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>>     , lengths = 4L
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , elementMetadata = new("DataFrame"
>>     , rownames = NULL
>>     , nrows = 4L
>>     , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak
>> = c(111L,
>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity",
>> "epeak", "over"))
>>     , elementType = "ANY"
>>     , elementMetadata = NULL
>>     , metadata = list()
>> )
>>     , seqinfo = new("Seqinfo"
>>     , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>> "chr15",
>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>> "chrX", "chrY")
>>     , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>>     , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>>     , genome = c(NA_character_, NA_character_, NA_character_,
>> NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>> )
>> )
>>     , metadata = list()
>> )
>>
>>> gr1
>> GRanges with 4 ranges and 3 metadata columns:
>>       seqnames           ranges strand |  edensity     epeak      over
>>          <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>>   [1]     chr1 [540823, 541043]      * |       589       111         0
>>   [2]     chr1 [752809, 753038]      * |       574        89         1
>>   [3]     chr1 [771015, 771534]      * |       578       234         1
>>   [4]     chr1 [773361, 773607]      * |       571       136         1
>>   ---
>>   seqlengths:
>>     chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>> chrY
>>       NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA
>> NA
>>> gr2
>> GRanges with 3 ranges and 3 metadata columns:
>>       seqnames           ranges strand |  edensity     epeak      over
>>          <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>>   [1]     chr1 [ 53141,  53685]      * |       601       212         0
>>   [2]     chr1 [521536, 521622]      * |       568        37         1
>>   [3]     chr1 [805002, 805650]      * |      1000       326         1
>>   ---
>>   seqlengths:
>>     chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>> chrY
>>       NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA
>> NA
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor