[R] merging-binning data
Boris Steipe
boris.steipe at utoronto.ca
Wed Nov 4 13:00:45 CET 2015
I would transform the original numbers into integers which you can use as group labels. The row numbers of the group labels are the indexes of your values.
Example: assume your input vector is dBin
nGroups <- 5 # number of groups
groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range [0,1]
groups <- floor(groups * nGroups) + 1 # discretize to nGroups integers
Now you can eg. get the indices for group 2
groups[groups == 2]
Depending on the nature of your input data, it may be better to keep these groups in a column adjacent to your values, rather than in a separate vector, or even better to just calculate the groups on the fly in your downstream analysis with the approach given above in a function, rather than storing them at all. These are simple operations that should not add perceptibly to execution time.
Cheers,
Boris
On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org> wrote:
> Thanks for the answer. Split does not give me the indexes though but only in which group they fall in. I also need the index of the group. Is the first, the second .. group?Alex
>
>
>
> On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote:
>
>
> Probably
>
> split(binDistance, test).
>
> Best,
> Ista
>
> On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote:
>> Dear all,I am not exactly sure on what is the proper name of what I am trying to do.
>> I have a vector that looks like
>> binDistance
>> [,1]
>> [1,] 238.95162
>> [2,] 143.08590
>> [3,] 88.50923
>> [4,] 177.67884
>> [5,] 277.54116
>> [6,] 342.94689
>> [7,] 241.60905
>> [8,] 177.81969
>> [9,] 211.25559
>> [10,] 279.72702
>> [11,] 381.95738
>> [12,] 483.76363
>> [13,] 480.98841
>> [14,] 369.75241
>> [15,] 267.73650
>> [16,] 138.55959
>> [17,] 137.93181
>> [18,] 184.75200
>> [19,] 254.64359
>> [20,] 328.87785
>> [21,] 273.15577
>> [22,] 252.52830
>> [23,] 252.52830
>> [24,] 252.52830
>> [25,] 262.20084
>> [26,] 314.93064
>> [27,] 366.02996
>> [28,] 442.77467
>> [29,] 521.20323
>> [30,] 465.33071
>> [31,] 366.60582
>> [32,] 13.69540
>> so numbers that start from 13 and go up to maximum 522 (I have also many other similar sets).I want to put these numbers into 5 categories and thus I have tried cut
>>
>>
>> Browse[2]> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1))
>> Browse[2]> test
>> [1] (217,318] (115,217] (13.7,115] (115,217] (217,318] (318,420]
>> [7] (217,318] (115,217] (115,217] (217,318] (318,420] (420,521]
>> [13] (420,521] (318,420] (217,318] (115,217] (115,217] (115,217]
>> [19] (217,318] (318,420] (217,318] (217,318] (217,318] (217,318]
>> [25] (217,318] (217,318] (318,420] (420,521] (420,521] (420,521]
>> [31] (318,420] (13.7,115]
>> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521]
>>
>>
>> I want then for the numbers of my initial vector that fall within the same "category" lets say the (318,420] to be collected on a vector.I rephrase it the indexes of my initial vector that have a value between 318 to 420 to be put in a same vector that I can process then as I want.
>> How I can do that effectively in R?
>> I would like to thank you for your replyRegardsAlex
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list