[R] complex search between dataframes

Marc Schwartz marc_schwartz at me.com
Thu Jun 2 23:44:36 CEST 2011


On Jun 2, 2011, at 1:42 PM, Filippo Beleggia wrote:

> Hi!
> 
> I am very new to R, I hope someone can help me.
> 
> I have two dataframes:
> 
> data1<-data.frame(from=c(1,12,16,40,55,81,101),to=c(10,13,23,45,67,99,123))
> data2<-data.frame(name=c(1,2,3,4,5,6,7,8,9),position=c(2,14,20,50,150,2000,2001,2002,85))
> 
> 
> I want to know which of the entries in "position" of data2 are included between 
> any "from" and the corresponding "to" of data1.
> 
> So in this case I would need to somehow be able to extract 2,20 and 85, 
> corrisponding to the "name"s 1,3 and 9.
> 
> Thank you very much!
> Filippo


See ?findInterval

Coerce data1 into a matrix, so that the interval boundaries are in increasing order by columns, which is then actually used by findInterval as a vector (eg. c(1, 10, 12, ...)):

> t(data1)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
from    1   12   16   40   55   81  101
to     10   13   23   45   67   99  123


findInterval() will return the interval indices for each data2$position value within the sorted intervals. Since your actual intervals are discontinuous, you only want the values that fit in the odd intervals, which is where the use of %in% seq(1, 13, 2) comes in. Prior to that, findInterval() returns:

> findInterval(data2$position, t(data1))
[1]  1  4  5  8 14 14 14 14 11

With it:

> findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)
[1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE

Now you can use the TRUE values to index data2$name:

> data2$name[findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)]
[1] 1 3 9

or data2$position:

> data2$position[findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)]
[1]  2 20 85



HTH,

Marc Schwartz



More information about the R-help mailing list