[R] findInterval and data resolution

Bryan Hanson hanson at depauw.edu
Mon Jul 12 23:54:23 CEST 2010


Thanks Duncan... More appended at the bottom...


On 7/12/10 5:38 PM, "Duncan Murdoch" <murdoch.duncan at gmail.com> wrote:

> On 12/07/2010 5:25 PM, Bryan Hanson wrote:
>> Hello Wise Ones...
>> 
>> I need a clever way around a problem with findInterval.  Consider:
>> 
>> vec1 <- 1:10
>> vec2 <- seq(1, 10, by = 0.1)
>> 
>> x1 <- c(2:3)
>> 
>> a1 <- findInterval(x1, vec1); a1 # example 1
>> a2 <- findInterval(x1, vec2); a2  # example 2
>> 
>> In the problem I'm working on, vec* may be either integer or numeric, like
>> vec1 and vec2.  I need to remove one or more sections of this vector; for
>> instance if I ask to remove values 2:3 I want to remove all values between 2
>> and 3 regardless of the resolution of the data (in my thinking, vec2 is more
>> dense or has better resolution than vec1).  So example 1 above works fine
>> because the values 2 and 3 are the end points of a range that includes no
>> values in-between (a1).  But, in example 2 the answer is, correctly, also
>> the end points, but now there are values in between these end points.  Hence
>> a2 doesn't include the indices of the values in-between the end points.
>> 
>> I have looked at cut, but it doesn't quite behave the way I want since if I
>> set x1 <- c(2:4) I get more intervals than I really want and cleaning it up
>> will be laborious.  I think I can construct the full set of indices I want
>> with a2[1]:a2[2] but is there a more clever way to do this?  I'm thinking
>> there might be a function out there that I am not aware of.
> 
> I'm not sure I understand what you want.  If you know x1 will always be
> an increasing vector, you could use something like a2[1]:a2[length(a2)]
> to select the full range of indices that it covers.  If x1 is not
> necessarily in increasing order, you'll have to do min(a2):max(a2)
> (which might be clearer in any case).
> 
> If you're more interested in the range of values in vec*, maybe
> 
> range(vec2[min(a2):max(a2)])
> 

min(a2):max(a2) is very helpful, as it fixes another problem that I did not
post about.  More generally, I want to pass a vector of pairs of values to
be removed, like this:

x1 <- c(2:3, 8:9); a3 <- findInterval(x1, vec2)
a3 # which turns out to be 11 21 71 81

Where I want my function to remove all values between 2 and 3, and between 8
and 9, regardless of how many values are between these indices.  So in the
example of a3, I want to remove everything between 11 and 21, and everything
between 71 and 81, keeping everything else.

I think I can put together a function pretty quickly that takes x1 in
sequential pairs and returns all the intervening indicies which can then be
used to clean up the original vector.

Thanks again, and if anyone has another idea, do tell!  Bryan



More information about the R-help mailing list