[BioC] identify non-overlapping regions
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Feb 7 15:58:56 CET 2011
Hi,
On Mon, Feb 7, 2011 at 9:34 AM, Chee Lee <cheelee at umich.edu> wrote:
> Hi all,
> Is there a bioconductor package that given a data frame with an 'end' and
> 'start' of each sequence region, I can parse it so that the output is set of
> regions that include only those regions that do not overlap? For example:
Your example didn't come through clearly at all -- next time, you can
set up your data object in R, and use "dput" to paste them into an
email in a way that we can then recover them in our R sessions.
For instance, I think you wanted to start with an example like this:
R> library(IRanges)
R> i <- IRanges(c(1, 8, 13), c(10, 20, 34))
Which looks like:
R> i
IRanges of length 3
start end width
[1] 1 10 10
[2] 8 20 13
> i
IRanges of length 3
start end width
[1] 1 10 10
[2] 8 20 13
[3] 13
The output of dput looks like:
R> dput(i)
new("IRanges"
, start = c(1L, 8L, 13L)
, width = c(10L, 13L, 22L)
, NAMES = NULL
, elementMetadata = NULL
, elementType = "integer"
, metadata = list()
)
Which is something we can copy and paste into R to recover the
original IRanges object.
Anyway, you were right to start by looking at IRanges, but you were
wrong to give up so soon :-) IRanges is an insanely "deep" library, so
you should take time to read through its vignettes, and even look
through its function list -- which you can get to via the "index" link
at the bottom of any of the IRanges specific help pages.
There are several ways to solve this problem, I'll show you one:
R> slice(coverage(i), upper=1, rangesOnly=TRUE)
IRanges of length 3
start end width
[1] 1 7 7
[2] 11 12 2
[3] 21 34 14
Look at the help for ?slice, and ?coverage.
Also, the "disjoin" function gets you close to what you want, as well.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list