[R] complex search between dataframes
Marc Schwartz
marc_schwartz at me.com
Thu Jun 2 23:44:36 CEST 2011
On Jun 2, 2011, at 1:42 PM, Filippo Beleggia wrote:
> Hi!
>
> I am very new to R, I hope someone can help me.
>
> I have two dataframes:
>
> data1<-data.frame(from=c(1,12,16,40,55,81,101),to=c(10,13,23,45,67,99,123))
> data2<-data.frame(name=c(1,2,3,4,5,6,7,8,9),position=c(2,14,20,50,150,2000,2001,2002,85))
>
>
> I want to know which of the entries in "position" of data2 are included between
> any "from" and the corresponding "to" of data1.
>
> So in this case I would need to somehow be able to extract 2,20 and 85,
> corrisponding to the "name"s 1,3 and 9.
>
> Thank you very much!
> Filippo
See ?findInterval
Coerce data1 into a matrix, so that the interval boundaries are in increasing order by columns, which is then actually used by findInterval as a vector (eg. c(1, 10, 12, ...)):
> t(data1)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
from 1 12 16 40 55 81 101
to 10 13 23 45 67 99 123
findInterval() will return the interval indices for each data2$position value within the sorted intervals. Since your actual intervals are discontinuous, you only want the values that fit in the odd intervals, which is where the use of %in% seq(1, 13, 2) comes in. Prior to that, findInterval() returns:
> findInterval(data2$position, t(data1))
[1] 1 4 5 8 14 14 14 14 11
With it:
> findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)
[1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
Now you can use the TRUE values to index data2$name:
> data2$name[findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)]
[1] 1 3 9
or data2$position:
> data2$position[findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)]
[1] 2 20 85
HTH,
Marc Schwartz
More information about the R-help
mailing list