[R] Classifying values by interval

Ted Harding ted.harding at wlandres.net
Wed Aug 31 10:39:21 CEST 2011


On 31-Aug-11 08:25:15, Jim Lemon wrote:
> On 08/31/2011 06:00 PM, Ted Harding wrote:
>> Greetings All!
>> As is often the case on this list, the answer may well
>> be under my nose but I can't see it!
>>
>> I am looking for a "smart" way to do the following.
>>
>> Say I have a vector of values, X. I set up bins" for X,
>> say with breaks at B = c(b1,b2,...,b11) covering the
>> range of X, i.e. bins numbered 1:10. The value x is in
>> bin i if B[i]<  x<= B[i+1]
>>
>> What I seek is a vector, of the same length as X, which
>> for each x in X gives the number of the bin that x is in.
>>
>> Clearly this can be done in an "unsmart" way by looping
>> through all of X along with something like
>>
>>    which( (B[1:10]<  X[j])&  (X[j]<= B[2:11]) )
>>
>> However, I feel that this naturally occurring task must
>> have received a smarter solution! The hist() function
>> already does this implicitly, since it has to decide
>> which bin a value in X should be counted in. But it
>> apparently then discards this information, since there
>> is nothing relevant in the return values from hist().
>>
>> So is there a "smart" function somewhere for this?
>>
>> The motivation here is that I have multivariate data,
>> (X,Y,Z,...) and I wish to study how it behaves in each
>> different bin for X. So the "bin index", ixB aY, derived
>> for X can be applied to select corresponding subsets of
>> the other variables. Rather than doing it the clumsy
>> way each time, e.g. according to
>>
>>    Y[(B[i]<  X)&  (X<= B[j+1])]
>>
>> I would like to have the bin index permanently available
>> -- for example it allows easy logical combinations of
>> bins, such as Y[(ixB==j1) | (ixB==j2)], or Y[(ixB %in% ixB0)].
>>
> Hi Ted,
> Are you looking for something like this?
> 
> x<-sample(1:10,20,TRUE)
> x
>   [1]  5 10 10  9  1  1  1  7  2  1  2  1  1  1  9  7  8  5  6  8
> binx<-cut(x,breaks=0:10)
> as.numeric(binx)
>   [1]  5 10 10  9  1  1  1  7  2  1  2  1  1  1  9  7  8  5  6  8
> 
> As binx is a factor, coercing it to numeric should return the bin
> number 
> for each value.
> 
> Jim

Thanks, Jim! That looks neat too. According to ?cut, findInterval
(as suggested by Dimitris) may be more efficient, but I'll have
to look more closely into all these possibilities.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-11                                       Time: 09:39:17
------------------------------ XFMail ------------------------------



More information about the R-help mailing list