[BioC] IRanges: Trying to cut overlapping intervals into pieces

Fri Jan 30 02:53:02 CET 2009

Hi Elizabeth,

I've added the punion(), pintersect() and psetdiff() functions (better
naming suggested by Michael, the "p" stands for "parallel") to the IRanges
package. Make sure you update your installation in about 24 hours and
let me know if you have any question.

Cheers,
H.

Hervé Pagès wrote:
> Hi Elizabeth,
> 
> Elizabeth Purdom wrote:
>> Thanks for the help. It's much neater than what I had (and I do have 
>> gaps). If you are thinking about adding this kind of functionality, 
>> I'd be glad to tell you more specifically what I am doing if it helps 
>> to have a more general picture.
>>
>> I kept thinking the intersect or setdiff would help for different 
>> things, but it wound up not doing what I needed. For example, it 
>> doesn't do pairwise, but across the entire set, right? I often had a 
>> set of intervals that I wanted to 'subtract' pairwise from another set 
>> of intervals. Just to know, is there a function that does that?
> 
> The union, setdiff or substraction of 2 ranges can not always be 
> represented
> by a single range so we need to be careful when performing those operations
> pairwise.
> 
> For the pairwise intersection, no problem, the intersection of 2 ranges
> can always be represented by a single range (eventually empty):
> 
>   pairwiseIntersect <- function(ir1, ir2)
>   {
>     if (length(ir1) != length(ir2))
>         stop("'ir1' and 'ir2' must have the same length")
>     ans_start <- pmax.int(start(ir1), start(ir2))
>     ans_end <- pmin.int(end(ir1), end(ir2))
>     ans_width <- ans_end - ans_start + 1L
>     ans_width[ans_width < 0L] <- 0L
>     IRanges(start=ans_start, width=ans_width)
>   }
> 
>   > ir1 <- IRanges(c(1, 5, -2, 0, 14), c(10, 9, 3, 11, 17))
>   > ir2 <- IRanges(c(14, 0, -5, 6, 18), c(20, 2, 2, 8, 20))
>   > ir1
>   IRanges object:
>     start end width
>   1     1  10    10
>   2     5   9     5
>   3    -2   3     6
>   4     0  11    12
>   5    14  17     4
>   > ir2
>   IRanges object:
>     start end width
>   1    14  20     7
>   2     0   2     3
>   3    -5   2     8
>   4     6   8     3
>   5    18  20     3
>   > pairwiseIntersect(ir1, ir2)
>   IRanges object:
>     start end width
>   1    14  13     0  <-- note the empty ranges when the intersection is 
> empty
>   2     5   4     0
>   3    -2   2     5
>   4     6   8     3
>   5    18  17     0
> 
> The union of 2 ranges will produce 1 range if there is no gap between
> the 2 ranges (i.e. they are overlapping or adjacent). So, in the general
> case, it's not possible to return the result of the pairwise union in an
> IRanges object of the same length as the input.
> One solution is to add the 'fill.gap' argument to let the user control
> whether s/he wants to enforce the union even when there is a gap
> between the 2 ranges (in that case, the union is obtained by filling
> this gap):
> 
>   pairwiseUnion <- function(ir1, ir2, fill.gap=FALSE)
>   {
>     if (length(ir1) != length(ir2))
>         stop("'ir1' and 'ir2' must have the same length")
>     if (!fill.gap) {
>         gap <- pmax.int(start(ir1), start(ir2)) -
>                pmin.int(end(ir1), end(ir2)) - 1L
>         if (any(gap > 0L))
>             stop("some pair of ranges have a gap within ",
>                  "the 2 members of the pair.\n",
>                  "  Use 'fill.gap=TRUE' to enforce union ",
>                  "by filling the gap.")
>     }
>     ans_start <- pmin.int(start(ir1), start(ir2))
>     ans_end <- pmax.int(end(ir1), end(ir2))
>     IRanges(start=ans_start, end=ans_end)
>   }
> 
>   > pairwiseUnion(ir1, ir2)
>   Error in pairwiseUnion(ir1, ir2) :
>     some pair of ranges have a gap within the 2 members of the pair.
>     Use 'fill.gap=TRUE' to enforce union by filling the gap.
>   > pairwiseUnion(ir1[3:5], ir2[3:5])
>   IRanges object:
>     start end width
>   1    -5   3     9
>   2     0  11    12
>   3    14  20     7
>   > pairwiseUnion(ir1, ir2, fill.gap=TRUE)
>   IRanges object:
>     start end width
>   1     1  20    20  <- gap was filled
>   2     0   9    10  <- gap was filled
>   3    -5   3     9
>   4     0  11    12
>   5    14  20     7
> 
> Subtracting range two from range one will produce 1 range except when
> range two has its end points strictly inside range one. What can we do
> in such situation? The following function will return an error when this
> happens:
> 
>   pairwiseSubtract <- function(ir1, ir2)
>   {
>     if (length(ir1) != length(ir2))
>         stop("'ir1' and 'ir2' must have the same length")
>     ans_start <- start(ir1)
>     ans_end <- end(ir1)
>     if (any((start(ir2) > ans_start) & (end(ir2) < ans_end)))
>         stop("some ranges in 'ir2' have their end points strictly 
> inside\n",
>              "  the range in 'ir1' that they need to be subtracted from")
>     start2 <- pmax.int(ans_start, start(ir2))
>     end2 <- pmin.int(ans_end, end(ir2))
>     ii <- start2 <= end2
>     jj <- end2 == ans_end
>     kk <- ii & jj
>     ans_end[kk] <- start2[kk] - 1L
>     kk <- ii & (!jj)
>     ans_start[kk] <- end2[kk] + 1L
>     IRanges(start=ans_start, end=ans_end)
>   }
> 
>   > pairwiseSubtract(ir1, ir2)
>   Error in pairwiseSubtract(ir1, ir2) :
>     some ranges in 'ir2' have their end points strictly inside
>     the range in 'ir1' that they need to be subtracted from
>   > start(ir2)[4] <- -99
>   > end(ir2)[4] <- 99
>   > pairwiseSubtract(ir1, ir2)
>   IRanges object:
>     start end width
>   1     1  10    10
>   2     5   9     5
>   3     3   3     1
>   4     0  -1     0  <- nothing left in this range
>   5    14  17     4
> 
> I will add these 3 functions to the IRanges package and will let you know
> when this is done.
> 
> Cheers,
> 
> H.
> 
>> I thought maybe 'narrow', but sometimes subtracting an interval would 
>> cut an existing interval into pieces, and narrow doesn't seem to do this.
>>
>> Best,
>> Elizabeth
>>
>> Michael Lawrence wrote:
>>>
>>>
>>> On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom 
>>> <epurdom at stat.berkeley.edu <mailto:epurdom at stat.berkeley.edu>> wrote:
>>>
>>>     Hi,
>>>
>>>     I am trying to take overlapping intervals and return a set of
>>>     intervals that are not overlapping but cover all of the region (and
>>>     mantain the intervals that don't overlap). In particular, I don't
>>>     want to merge intervals that overlap together (i.e. the reduce
>>>     function in IRanges)-- I want to cut them up into distinct regions.
>>>     For example, if I have intervals:
>>>     [1,6], [4,8], [7,10]
>>>     I want to get back the set of adjacent intervals:
>>>     [1,3],[4,6],[7,8],[9,10]
>>>
>>>
>>> Well that's a fun one.
>>>
>>> ir <- IRanges(c(1, 4, 7), c(6, 8, 10))
>>> adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), 
>>> sort(unique(c(end(ir), tail(start(ir),-1)-1))))
>>>
>>> ... is a not so nice one, but pretty fast..
>>>  
>>> But if you had a gap in those ranges, like:
>>>
>>> ir <- IRanges(c(1, 4, 10), c(6, 8, 10))
>>>
>>> So there's a gap at position 9, you would need an additional 
>>> filtering step:
>>>
>>> adj[adj %in% ir]
>>>
>>> This last step requires the devel version of IRanges, but can be 
>>> emulated using !is.na <http://is.na>(overlap(ir, adj, multiple=FALSE)).
>>>
>>>
>>>     The options I find that look like they perhaps do this (intersect or
>>>     setdiff?) seem to be related to the 'normal' ranges class; but this
>>>     class requires a gap between intervals -- no adjacent intervals --
>>>     which is not what I want. Is there a nice way to do this with
>>>     IRanges (or a not so nice one, but fast)?
>>>
>>>
>>> The intersect and setdiff functions are for any Ranges, normal or 
>>> not. They return normal IRanges though. Perhaps the documentation 
>>> does not make this clear. They probably aren't very useful functions.
>>>  
>>>
>>>
>>>     Similarly, is there a 'reduce' version that doesn't merge adjacent
>>>     intervals but only truly overlapping ones? There are a lot of
>>>     annotation examples where you wouldn't not want to merge adjacent
>>>     intervals (e.g. UTRs)
>>>
>>>
>>> Try a trick like this:
>>>
>>> ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9))
>>> width(ir2) <- width(ir2) - 1
>>> rir2 <- reduce(ir2)
>>> width(rir2) <- width(rir2) + 1
>>>
>>> Or find the overlap, reduce those that did overlap and combine that 
>>> result with those that did not overlap.
>>>  
>>>
>>>
>>>     Thanks for any assistance!
>>>
>>>
>>> Thanks for providing more use cases. We'll consider adding 
>>> functionality along these lines to the base package (actually the 
>>> reduce one has been on the TODO list for many months).
>>>  
>>>
>>>
>>>     Elizabeth Purdom
>>>     Division of Biostatistics
>>>     UC, Berkeley
>>>
>>>     _______________________________________________
>>>     Bioconductor mailing list
>>>     Bioconductor at stat.math.ethz.ch 
>>> <mailto:Bioconductor at stat.math.ethz.ch>
>>>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>     Search the archives:
>>>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319