[R] overlapping intervals

Charles C. Berry cberry at tajo.ucsd.edu
Mon Oct 16 19:10:28 CEST 2006


If speed is an issue as in large scale (e.g. genomic) problems, then 
findInterval is very helpful. See

 	http://finzi.psych.upenn.edu/R/Rhelp02a/archive/60815.html

for an example.

On Sun, 15 Oct 2006, jim holtman wrote:

> Not the most efficient and requires integer values (maybe less than
> 1M). My results show an additional overlap at 40 - start & end were
> the same -- does this count?  If not, just delete rows that are the
> same in both columns.
>
>
>> series1<-cbind(Start=c(10,21,40,300),End=c(20,26,70,350))
>> series2<-cbind(Start=c(25,60,210,500),End=c(40,100,400,1000))
>> x1 <- x2 <- logical(max(series1, series2))  # vector FALSE
>> x1[unlist(mapply(seq, series1[,1], series1[,2]))] <- TRUE
>> x2[unlist(mapply(seq, series2[,1], series2[,2]))] <- TRUE
>> r <- rle(x1 & x2)  # determine overlaps
>> offset <- cumsum(r$lengths)
>> (z <- cbind(offset[r$values] - r$lengths[r$values] + 1, offset[r$values]))
>     [,1] [,2]
> [1,]   25   26
> [2,]   40   40
> [3,]   60   70
> [4,]  300  350
>> # if you don't like dups for overlaps (@40)
>> z[z[,1] != z[,2],]
>     [,1] [,2]
> [1,]   25   26
> [2,]   60   70
> [3,]  300  350
>
> On 10/15/06, Giovanni Coppola <gcoppola at ucla.edu> wrote:
>> Hello everybody,
>>
>> I have two series of intervals, and I'd like to output the shared
>> regions.
>> For example:
>> series1<-cbind(Start=c(10,21,40,300),End=c(20,26,70,350))
>> series2<-cbind(Start=c(25,60,210,500),End=c(40,100,400,1000))
>>
>> > series1
>>      Start End
>> [1,]    10  20
>> [2,]    21  26
>> [3,]    40  70
>> [4,]   300 350
>> > series2
>>      Start  End
>> [1,]    25   40
>> [2,]    60  100
>> [3,]   210  400
>> [4,]   500 1000
>>
>> I'd like to have something like this as result:
>> > shared
>>      Start End
>> [1,]    25  26
>> [2,]    60  70
>> [3,]   300 350
>>
>> I found this post, but the solution finds the regions shared across
>> all the intervals.
>> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/59594.html
>> Can anybody help me with this?
>> Thanks
>> Giovanni
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717



More information about the R-help mailing list