[R] Finding overlaps in vector

Fri Dec 21 16:32:24 CET 2007

Here is a modification of the algorithm to use a specified value for
the overlap:

> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
> # following add 0.5 as the overlap detection -- can be changed
> x <- rbind(cbind(value=vector, oper=1, id=seq_along(vector)),
+            cbind(value=vector+0.5, oper=-1, id=seq_along(vector)))
> x <- x[order(x[,'value'], -x[, 'oper']),]
> # determine which ones overlap
> x <- cbind(x, over=cumsum(x[, 'oper']))
> # now partition into groups and only use groups greater than or equal to 3
> # determine where the breaks are (0 values in cumsum(over))
> x <- cbind(x, breaks=cumsum(x[, 'over'] == 0))
> # delete entries with 'over' == 0
> x <- x[x[, 'over'] != 0,]
> # split into groupd
> x.groups <- split(x[, 'id'], x[, 'breaks'])
> # only keep those with more than 2
> x.subsets <- x.groups[sapply(x.groups, length) >= 3]
> # print out the subsets
> invisible(lapply(x.subsets, function(a) print(vector[unique(a)])))
[1] 0.00 0.45
[1] 3.00 3.25 3.33 3.75 4.10
[1] 6.00 6.45
[1] 7.0 7.1

On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de> wrote:
> <posted & mailed>
>
> Dear all,
>
> I'm trying to solve the problem, of how to find clusters of values in a
> vector that are closer than a given value. Illustrated this might look as
> follows:
>
> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>
> When using '0.5' as the proximity requirement, the following groups would
> result:
> 0,0.45
> 3,3.25,3.33,3.75,4.1
> 6,6.45
> 7,7.1
>
> Jim Holtman proposed a very elegant solution in
> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
> modified and perused since he wrote it to me. The beauty of this approach
> is that it will not only work for constant proximity requirements as above,
> but also for overlap-windows defined in terms of ppm around each value.
> Now I have an additional need and have found no way (short of iteratively
> step through all the groups returned) to figure out how to do that with
> Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate
> clusters?
>
> Thanks for any hints, Joh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?