[BioC] IRanges: Trying to cut overlapping intervals into pieces
Elizabeth Purdom
epurdom at stat.berkeley.edu
Sun Jan 25 06:45:30 CET 2009
Thanks for the help. It's much neater than what I had (and I do have
gaps). If you are thinking about adding this kind of functionality, I'd
be glad to tell you more specifically what I am doing if it helps to
have a more general picture.
I kept thinking the intersect or setdiff would help for different
things, but it wound up not doing what I needed. For example, it doesn't
do pairwise, but across the entire set, right? I often had a set of
intervals that I wanted to 'subtract' pairwise from another set of
intervals. Just to know, is there a function that does that? I thought
maybe 'narrow', but sometimes subtracting an interval would cut an
existing interval into pieces, and narrow doesn't seem to do this.
Best,
Elizabeth
Michael Lawrence wrote:
>
>
> On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom
> <epurdom at stat.berkeley.edu <mailto:epurdom at stat.berkeley.edu>> wrote:
>
> Hi,
>
> I am trying to take overlapping intervals and return a set of
> intervals that are not overlapping but cover all of the region (and
> mantain the intervals that don't overlap). In particular, I don't
> want to merge intervals that overlap together (i.e. the reduce
> function in IRanges)-- I want to cut them up into distinct regions.
> For example, if I have intervals:
> [1,6], [4,8], [7,10]
> I want to get back the set of adjacent intervals:
> [1,3],[4,6],[7,8],[9,10]
>
>
> Well that's a fun one.
>
> ir <- IRanges(c(1, 4, 7), c(6, 8, 10))
> adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))),
> sort(unique(c(end(ir), tail(start(ir),-1)-1))))
>
> ... is a not so nice one, but pretty fast..
>
> But if you had a gap in those ranges, like:
>
> ir <- IRanges(c(1, 4, 10), c(6, 8, 10))
>
> So there's a gap at position 9, you would need an additional filtering step:
>
> adj[adj %in% ir]
>
> This last step requires the devel version of IRanges, but can be
> emulated using !is.na <http://is.na>(overlap(ir, adj, multiple=FALSE)).
>
>
> The options I find that look like they perhaps do this (intersect or
> setdiff?) seem to be related to the 'normal' ranges class; but this
> class requires a gap between intervals -- no adjacent intervals --
> which is not what I want. Is there a nice way to do this with
> IRanges (or a not so nice one, but fast)?
>
>
> The intersect and setdiff functions are for any Ranges, normal or not.
> They return normal IRanges though. Perhaps the documentation does not
> make this clear. They probably aren't very useful functions.
>
>
>
> Similarly, is there a 'reduce' version that doesn't merge adjacent
> intervals but only truly overlapping ones? There are a lot of
> annotation examples where you wouldn't not want to merge adjacent
> intervals (e.g. UTRs)
>
>
> Try a trick like this:
>
> ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9))
> width(ir2) <- width(ir2) - 1
> rir2 <- reduce(ir2)
> width(rir2) <- width(rir2) + 1
>
> Or find the overlap, reduce those that did overlap and combine that
> result with those that did not overlap.
>
>
>
> Thanks for any assistance!
>
>
> Thanks for providing more use cases. We'll consider adding functionality
> along these lines to the base package (actually the reduce one has been
> on the TODO list for many months).
>
>
>
> Elizabeth Purdom
> Division of Biostatistics
> UC, Berkeley
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list