[R] range segment exclusion using range endpoints

Mon May 14 21:06:34 CEST 2012

Hi all,

Nice code samples presented all around.

Just wanted to point out that I think the stuff found in the
`intervals` package might also be helpful:

http://cran.at.r-project.org/web/packages/intervals/index.html

HTH,
-steve

On Mon, May 14, 2012 at 2:54 PM, Ben quant <ccquant at gmail.com> wrote:
> Yes, it is. I'm looking into understanding this now...
>
> thanks!
> Ben
>
> On Mon, May 14, 2012 at 12:38 PM, William Dunlap <wdunlap at tibco.com> wrote:
>
>> To the list of function I sent, add another that converts a list of
>> intervals
>> into a Ranges object:
>>  as.Ranges.list <- function (x, ...) {
>>      stopifnot(nargs() == 1, all(vapply(x, length, 0) == 2))
>>      # use c() instead of unlist() because c() doesn't mangle POSIXct and
>> Date objects
>>      x <- unname(do.call(c, x))
>>      odd <- seq(from = 1, to = length(x), by = 2)
>>      as.Ranges(bottoms = x[odd], tops = x[odd + 1])
>>  }
>> Then stop using get() and assign() all over the place and instead make
>> lists of
>> related intervals and convert them to Ranges objects:
>>  > x <- as.Ranges(list(x_rng))
>>  > s <- as.Ranges(list(s1_rng, s2_rng, s3_rng, s4_rng, s5_rng))
>>  > x
>>    bottoms tops
>>  1    -100  100
>>  > s
>>    bottoms tops
>>  1 -250.50 30.0
>>  2    0.77 10.0
>>  3   25.00 35.0
>>  4   70.00 80.3
>>  5   90.00 95.0
>> and then compute the difference between the sets x and s (i.e., describe
>> the points in x but not s as a union of intervals):
>>  > setdiffRanges(x, s)
>>    bottoms tops
>>  1    35.0   70
>>  2    80.3   90
>>  3    95.0  100
>> and for a graphical check do
>>  > plot(x, s, setdiffRanges(x, s))
>> Are those the numbers you want?
>>
>> I find it easier to use standard functions and data structures for this
>> than
>> to adapt the cumsum/order idiom to different situations.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> On Behalf
>> > Of Ben quant
>> > Sent: Monday, May 14, 2012 11:07 AM
>> > To: jim holtman
>> > Cc: r-help at r-project.org
>> > Subject: Re: [R] range segment exclusion using range endpoints
>> >
>> > Turns out this solution doesn't work if the s range is outside the range
>> of
>> > the x range. I didn't include that in my examples, but it is something I
>> > have to deal with quite often.
>> >
>> > For example s1_rng below causes an issue:
>> >
>> > x_rng = c(-100,100)
>> > s1_rng = c(-250.5,30)
>> > s2_rng = c(0.77,10)
>> > s3_rng = c(25,35)
>> > s4_rng = c(70,80.3)
>> > s5_rng = c(90,95)
>> >
>> > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
>> > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
>> > for (i in sNames){
>> >   queue <- rbind(queue
>> >                  , c(get(i)[1], 1)  # enter queue
>> >                  , c(get(i)[2], -1)  # exit queue
>> >                  )
>> > }
>> > queue <- queue[order(queue[, 1]), ]  # sort
>> > queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
>> > for (i in which(queue[, 3] == 1)){
>> >   cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
>> > }
>> >
>> > Regards,
>> >
>> > ben
>> > On Sat, May 12, 2012 at 12:50 PM, jim holtman <jholtman at gmail.com>
>> wrote:
>> >
>> > > Here is an example of how you might do it.  It uses a technique of
>> > > counting how many items are in a queue based on their arrival times;
>> > > it can be used to also find areas of overlap.
>> > >
>> > > Note that it would be best to use a list for the 's' end points
>> > >
>> > > ================================
>> > > > # note the next statement removes names of the format 's[0-9]+_rng'
>> > > > # it would be best to create a list with the 's' endpoints, but this
>> is
>> > > > # what the OP specified
>> > > >
>> > > > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE))  # Danger Will
>> > > Robinson!!
>> > > >
>> > > > # ex 1
>> > > > x_rng = c(-100,100)
>> > > >
>> > > > s1_rng = c(-25.5,30)
>> > > > s2_rng = c(0.77,10)
>> > > > s3_rng = c(25,35)
>> > > > s4_rng = c(70,80.3)
>> > > > s5_rng = c(90,95)
>> > > >
>> > > > # ex 2
>> > > > # x_rng = c(-50.5,100)
>> > > >
>> > > > # s1_rng = c(-75.3,30)
>> > > >
>> > > > # ex 3
>> > > > # x_rng = c(-75.3,30)
>> > > >
>> > > > # s1_rng = c(-50.5,100)
>> > > >
>> > > > # ex 4
>> > > > # x_rng = c(-100,100)
>> > > >
>> > > > # s1_rng = c(-105,105)
>> > > >
>> > > > # find all the names -- USE A LIST NEXT TIME
>> > > > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
>> > > >
>> > > > # initial matrix with the 'x' endpoints
>> > > > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
>> > > >
>> > > > # add the 's' end points to the list
>> > > > # this will be used to determine how many things are in a queue (or
>> > > areas that
>> > > > # overlap)
>> > > > for (i in sNames){
>> > > +     queue <- rbind(queue
>> > > +                 , c(get(i)[1], 1)  # enter queue
>> > > +                 , c(get(i)[2], -1)  # exit queue
>> > > +                 )
>> > > + }
>> > > > queue <- queue[order(queue[, 1]), ]  # sort
>> > > > queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
>> > > > print(queue)
>> > >         [,1] [,2] [,3]
>> > >  [1,] -100.00    1    1
>> > >  [2,]  -25.50    1    2
>> > >  [3,]    0.77    1    3
>> > >  [4,]   10.00   -1    2
>> > >  [5,]   25.00    1    3
>> > >  [6,]   30.00   -1    2
>> > >  [7,]   35.00   -1    1
>> > >  [8,]   70.00    1    2
>> > >  [9,]   80.30   -1    1
>> > > [10,]   90.00    1    2
>> > > [11,]   95.00   -1    1
>> > > [12,]  100.00    1    2
>> > > >
>> > > > # print out values where the last column is 1
>> > > > for (i in which(queue[, 3] == 1)){
>> > > +     cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
>> > > + }
>> > > start: -100   end: -25.5
>> > > start: 35   end: 70
>> > > start: 80.3   end: 90
>> > > start: 95   end: 100
>> > > >
>> > > >
>> > > =========================================
>> > >
>> > > On Sat, May 12, 2012 at 1:54 PM, Ben quant <ccquant at gmail.com> wrote:
>> > > > Hello,
>> > > >
>> > > > I'm posting this again (with some small edits). I didn't get any
>> replies
>> > > > last time...hoping for some this time. :)
>> > > >
>> > > > Currently I'm only coming up with brute force solutions to this issue
>> > > > (loops). I'm wondering if anyone has a better way to do this. Thank
>> you
>> > > for
>> > > > your help in advance!
>> > > >
>> > > > The problem: I have endpoints of one x range (x_rng) and an unknown
>> > > number
>> > > > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like
>> to
>> > > > remove the x ranges that overlap with the s ranges. The examples
>> below
>> > > > demonstrate what I mean.
>> > > >
>> > > > What is the best way to do this?
>> > > >
>> > > > Ex 1.
>> > > > For:
>> > > > x_rng = c(-100,100)
>> > > >
>> > > > s1_rng = c(-25.5,30)
>> > > > s2_rng = c(0.77,10)
>> > > > s3_rng = c(25,35)
>> > > > s4_rng = c(70,80.3)
>> > > > s5_rng = c(90,95)
>> > > >
>> > > > I would get:
>> > > > -100,-25.5
>> > > > 35,70
>> > > > 80.3,90
>> > > > 95,100
>> > > >
>> > > > Ex 2.
>> > > > For:
>> > > > x_rng = c(-50.5,100)
>> > > >
>> > > > s1_rng = c(-75.3,30)
>> > > >
>> > > > I would get:
>> > > > 30,100
>> > > >
>> > > > Ex 3.
>> > > > For:
>> > > > x_rng = c(-75.3,30)
>> > > >
>> > > > s1_rng = c(-50.5,100)
>> > > >
>> > > > I would get:
>> > > > -75.3,-50.5
>> > > >
>> > > > Ex 4.
>> > > > For:
>> > > > x_rng = c(-100,100)
>> > > >
>> > > > s1_rng = c(-105,105)
>> > > >
>> > > > I would get something like:
>> > > > NA,NA
>> > > > or...
>> > > > NA
>> > > >
>> > > > Ex 5.
>> > > > For:
>> > > > x_rng = c(-100,100)
>> > > >
>> > > > s1_rng = c(-100,100)
>> > > >
>> > > > I would get something like:
>> > > > -100,-100
>> > > > 100,100
>> > > > or just...
>> > > > -100
>> > > >  100
>> > > >
>> > > > PS - You may have noticed that in all of the examples I am including
>> the
>> > > s
>> > > > range endpoints in the desired results, which I can deal with later
>> in my
>> > > > program so its not a problem...  I think leaving in the s range
>> endpoints
>> > > > simplifies the problem.
>> > > >
>> > > > Thanks!
>> > > > Ben
>> > > >
>> > > >        [[alternative HTML version deleted]]
>> > > >
>> > > > ______________________________________________
>> > > > R-help at r-project.org mailing list
>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > PLEASE do read the posting guide
>> > > http://www.R-project.org/posting-guide.html
>> > > > and provide commented, minimal, self-contained, reproducible code.
>> > >
>> > >
>> > >
>> > > --
>> > > Jim Holtman
>> > > Data Munger Guru
>> > >
>> > > What is the problem that you are trying to solve?
>> > > Tell me what you want to do, not how you want to do it.
>> > >
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact