[R] merging-binning data

Boris Steipe boris.steipe at utoronto.ca
Wed Nov 4 15:06:43 CET 2015


The breaks are just the min() and max() in your groups. Something like

  sprintf("[%5.2f,%5.2f]", min(dBin[groups==2]), max(dBin[groups==2]))

... should achieve what you need.


B.



On Nov 4, 2015, at 8:45 AM, Alaios <alaios at yahoo.com> wrote:

> you are right.
> by labels I mean the "categories", "breaks" that my data fall in.
> To be part of group 2 for example you have to be in the range of [110,223) I need to keep those for my plots.
> 
> Did I describe it more precisely now?
> Alex
> 
> 
> 
> On Wednesday, November 4, 2015 2:09 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
> 
> 
> I don't understand: 
> - where does the "label" come from? (It's not an element of your data that I see.)
> - what do you want to do with this "label" i.e. how does it need to be associated with the data?
> 
> 
> B.
> 
> 
> 
> On Nov 4, 2015, at 7:57 AM, Alaios <alaios at yahoo.com> wrote:
> 
> > Thanks it works great and gives me group numbers as integers and thus I can with which group the elements as needed (which (groups== 2))
> > 
> > Question though is how to keep also the labels for each group. For example that my first group is the [13,206)
> > 
> > Regards
> > Alex
> > 
> > 
> > 
> > On Wednesday, November 4, 2015 1:00 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
> > 
> > 
> > I would transform the original numbers into integers which you can use as group labels. The row numbers of the group labels are the indexes of your values.
> > 
> > Example: assume your input vector is dBin
> > 
> > nGroups <- 5  # number of groups
> > groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the range [0,1]
> > groups <- floor(groups * nGroups) + 1  # discretize to nGroups integers
> > 
> > Now you can eg. get the indices for group 2
> > 
> > groups[groups == 2]
> > 
> > Depending on the nature of your input data, it may be better to keep these groups in a column adjacent to your values, rather than in a separate vector, or even better to just calculate the groups on the fly in your downstream analysis with the approach given above in a function, rather than storing them at all. These are simple operations that should not add perceptibly to execution time.
> > 
> > Cheers,
> > Boris
> > 
> > 
> > 
> > 
> > 
> > 
> > On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help at r-project.org> wrote:
> > 
> > > Thanks for the answer. Split does not give me the indexes though but only in which group they fall in. I also need the index of the group. Is the first, the second .. group?Alex
> > > 
> > > 
> > > 
> > >    On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote:
> > > 
> > > 
> > > Probably
> > > 
> > > split(binDistance, test).
> > > 
> > > Best,
> > > Ista
> > > 
> > > On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help <r-help at r-project.org> wrote:
> > >> Dear all,I am not exactly sure on what is the proper name of what I am trying to do.
> > >> I have a vector that looks like
> > >>  binDistance
> > >>            [,1]
> > >>  [1,] 238.95162
> > >>  [2,] 143.08590
> > >>  [3,]  88.50923
> > >>  [4,] 177.67884
> > >>  [5,] 277.54116
> > >>  [6,] 342.94689
> > >>  [7,] 241.60905
> > >>  [8,] 177.81969
> > >>  [9,] 211.25559
> > >> [10,] 279.72702
> > >> [11,] 381.95738
> > >> [12,] 483.76363
> > >> [13,] 480.98841
> > >> [14,] 369.75241
> > >> [15,] 267.73650
> > >> [16,] 138.55959
> > >> [17,] 137.93181
> > >> [18,] 184.75200
> > >> [19,] 254.64359
> > >> [20,] 328.87785
> > >> [21,] 273.15577
> > >> [22,] 252.52830
> > >> [23,] 252.52830
> > >> [24,] 252.52830
> > >> [25,] 262.20084
> > >> [26,] 314.93064
> > >> [27,] 366.02996
> > >> [28,] 442.77467
> > >> [29,] 521.20323
> > >> [30,] 465.33071
> > >> [31,] 366.60582
> > >> [32,]  13.69540
> > >> so numbers that start from 13 and go up to maximum 522 (I have also many other similar sets).I want to put these numbers into 5 categories and thus I have tried cut
> > >> 
> > >> 
> > >> Browse[2]> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1))
> > >> Browse[2]> test
> > >>  [1] (217,318]  (115,217]  (13.7,115] (115,217]  (217,318]  (318,420]
> > >>  [7] (217,318]  (115,217]  (115,217]  (217,318]  (318,420]  (420,521]
> > >> [13] (420,521]  (318,420]  (217,318]  (115,217]  (115,217]  (115,217]
> > >> [19] (217,318]  (318,420]  (217,318]  (217,318]  (217,318]  (217,318]
> > >> [25] (217,318]  (217,318]  (318,420]  (420,521]  (420,521]  (420,521]
> > >> [31] (318,420]  (13.7,115]
> > >> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521]
> > >> 
> > >> 
> > >> I want then for the numbers of my initial vector that fall within the same "category" lets say the (318,420] to be collected on a vector.I rephrase it the indexes of my initial vector that have a value between 318 to 420 to be put in a same vector that I can process then as I want.
> > >> How I can do that effectively in R?
> > >> I would like to thank you for your replyRegardsAlex
> > >> 
> > >>        [[alternative HTML version deleted]]
> > >> 
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > > 
> > > 
> > >    [[alternative HTML version deleted]]
> > > 
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > 
> 
> 



More information about the R-help mailing list