[R] overlapping intervals
Charles C. Berry
cberry at tajo.ucsd.edu
Mon Oct 16 19:10:28 CEST 2006
If speed is an issue as in large scale (e.g. genomic) problems, then
findInterval is very helpful. See
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/60815.html
for an example.
On Sun, 15 Oct 2006, jim holtman wrote:
> Not the most efficient and requires integer values (maybe less than
> 1M). My results show an additional overlap at 40 - start & end were
> the same -- does this count? If not, just delete rows that are the
> same in both columns.
>
>
>> series1<-cbind(Start=c(10,21,40,300),End=c(20,26,70,350))
>> series2<-cbind(Start=c(25,60,210,500),End=c(40,100,400,1000))
>> x1 <- x2 <- logical(max(series1, series2)) # vector FALSE
>> x1[unlist(mapply(seq, series1[,1], series1[,2]))] <- TRUE
>> x2[unlist(mapply(seq, series2[,1], series2[,2]))] <- TRUE
>> r <- rle(x1 & x2) # determine overlaps
>> offset <- cumsum(r$lengths)
>> (z <- cbind(offset[r$values] - r$lengths[r$values] + 1, offset[r$values]))
> [,1] [,2]
> [1,] 25 26
> [2,] 40 40
> [3,] 60 70
> [4,] 300 350
>> # if you don't like dups for overlaps (@40)
>> z[z[,1] != z[,2],]
> [,1] [,2]
> [1,] 25 26
> [2,] 60 70
> [3,] 300 350
>
> On 10/15/06, Giovanni Coppola <gcoppola at ucla.edu> wrote:
>> Hello everybody,
>>
>> I have two series of intervals, and I'd like to output the shared
>> regions.
>> For example:
>> series1<-cbind(Start=c(10,21,40,300),End=c(20,26,70,350))
>> series2<-cbind(Start=c(25,60,210,500),End=c(40,100,400,1000))
>>
>> > series1
>> Start End
>> [1,] 10 20
>> [2,] 21 26
>> [3,] 40 70
>> [4,] 300 350
>> > series2
>> Start End
>> [1,] 25 40
>> [2,] 60 100
>> [3,] 210 400
>> [4,] 500 1000
>>
>> I'd like to have something like this as result:
>> > shared
>> Start End
>> [1,] 25 26
>> [2,] 60 70
>> [3,] 300 350
>>
>> I found this post, but the solution finds the regions shared across
>> all the intervals.
>> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/59594.html
>> Can anybody help me with this?
>> Thanks
>> Giovanni
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
More information about the R-help
mailing list