[R] Binning question (binning rows of a data.frame according to a variable)

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Fri Mar 17 17:12:00 CET 2006


Dan Bolser wrote:
> Hi,
> 
> I have tuples of data in rows of a data.frame, each column is a variable 
> for the 'items' (one per row).
> 
> One of the variables is the 'size' of the item (row).
> 
> I would like to cut my data.frame into groups such that each group has 
> the same *total size*. So, assuming that we order by size, some groups 
> should have several small items while other groups have a few large 
> items. All the groups should have approximately the same total size.
> 
> I have tried various combinations of cut, quantile, and ecdf, and I just 
> can't work out how to do this!
> 
> Any help is greatly appreciated!
> 
> All the best,
> Dan.
> 

Perhaps there is a cleaver way, but I just wrote this in despiration...


my.groups <- 8

my.total <-
   sum(my.res.1$TOT)   ## The 'size' variable in my data.frame

my.approx.size <-
   my.total/
   my.groups

my.j <- 1
my.roll <- 0
my.factor <- numeric()

for(i in sort(my.res.1$TOT)){

   my.roll <-
     my.roll + i

   if (my.roll > my.approx.size * my.j)
     my.j <- my.j + 1

   my.factor <-
     append(my.factor,my.j)
}

my.factor <-
   as.factor(my.factor)



Then...

 > tapply(my.factor,my.factor,length)
   1   2   3   4   5   6   7   8
152  62  45  34  25  21  14   8


And...

 > tapply(sort(my.res.1$TOT),my.factor,sum)
    1    2    3    4    5    6    7    8
2880 2848 2912 2893 2832 2906 2776 3029
 >



Which isn't bad.












> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list