[R] Binning question (binning rows of a data.frame according to a variable)
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Fri Mar 17 17:12:00 CET 2006
Dan Bolser wrote:
> Hi,
>
> I have tuples of data in rows of a data.frame, each column is a variable
> for the 'items' (one per row).
>
> One of the variables is the 'size' of the item (row).
>
> I would like to cut my data.frame into groups such that each group has
> the same *total size*. So, assuming that we order by size, some groups
> should have several small items while other groups have a few large
> items. All the groups should have approximately the same total size.
>
> I have tried various combinations of cut, quantile, and ecdf, and I just
> can't work out how to do this!
>
> Any help is greatly appreciated!
>
> All the best,
> Dan.
>
Perhaps there is a cleaver way, but I just wrote this in despiration...
my.groups <- 8
my.total <-
sum(my.res.1$TOT) ## The 'size' variable in my data.frame
my.approx.size <-
my.total/
my.groups
my.j <- 1
my.roll <- 0
my.factor <- numeric()
for(i in sort(my.res.1$TOT)){
my.roll <-
my.roll + i
if (my.roll > my.approx.size * my.j)
my.j <- my.j + 1
my.factor <-
append(my.factor,my.j)
}
my.factor <-
as.factor(my.factor)
Then...
> tapply(my.factor,my.factor,length)
1 2 3 4 5 6 7 8
152 62 45 34 25 21 14 8
And...
> tapply(sort(my.res.1$TOT),my.factor,sum)
1 2 3 4 5 6 7 8
2880 2848 2912 2893 2832 2906 2776 3029
>
Which isn't bad.
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list