[R] Binning question (binning rows of a data.frame according to a variable)

Gabor Grothendieck ggrothendieck at gmail.com
Sun Mar 19 20:02:25 CET 2006


On 3/19/06, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote:
> Gabor Grothendieck wrote:
> > On 3/18/06, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote:
> >
> >>Gabor Grothendieck wrote:
> >>
> >>>If you are just looking for something simple that may be good enough
> >>>then assign the largest one to group 1, the second largest to group 2,
> >>>..., the 8th largest to group 8 and then start over again with group 1
> >>>and so on.
> >>>
> >>># test data
> >>>set.seed(1)
> >>>x <- sample(100, 100, rep = TRUE)
> >>>
> >>>xs <- sort(x)
> >>>g <- gl(8, 1, length(xs)) # 8 groups
> >>>
> >>># so that g contains the groups that correspond to xs.
> >>>
> >>>tapply(xs, g, sum)   # 659 671 687 701 612 622 629 646
> >>>
> >>
> >>
> >>That is a fairly neat way of getting groups with a good 'approximate
> >>same size', however, in general I would like to be able to order my data
> >>in any way, and still cut it into equal 'size' groups (like quantiles
> >>for rows, but for row variable totals instead).
> >
> >
> > Do you mean you want g to be in the original order of x?
>
> No. What I mean is that I want to order x by any particular variable in
> my data.frame, then group over x such that each group has roughly the
> same sum.
>
> I get the feeling I have missed a very simple trick.


Suggest providing a short self contained reproducible example including
input and desired output and a detailed explanation.




More information about the R-help mailing list