# [R] Binning question (binning rows of a data.frame according to a variable)

Sun Mar 19 06:37:22 CET 2006

```Do you by any chance want to sample from each group equally to get an
equal representation matrix ? Here is an example of the input :

mydf <- data.frame( value=1:100, value2=rnorm(100),
grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) )

which has 35 observations from A, 15 from B, 30 from C and 20 from D.

And here is a function that I wrote:

sample.by.group <- function(df, grp, k, replace=FALSE){

if(length(k)==1){ k <- rep(k, length(unique(grp))) }

if(!replace && any(k > table(grp)))
stop( paste("Cannot take a sample larger than the population when
'replace = FALSE'.\n", "Please specify a value greater than",
min(table(grp)), "or use 'replace = TRUE'.\n") )

ind   <- model.matrix( ~ -1 + grp )
w.mat <- list(NULL)

for(i in 1:ncol(ind)){
w.mat[[i]] <- sample( which( ind[,i]==1 ), k[i], replace=replace )
}

out <- df[ unlist(w.mat), ]
return(out)
}

And here are some examples of how to use it :

mydf <- mydf[ sample(1:nrow(mydf)), ]   # scramble it for fun

out1 <- sample.by.group(mydf, mydf\$grp, k=10 )
table( out1\$grp )

out2 <- sample.by.group(mydf, mydf\$grp, k=50, replace=T) # ie bootstrap
table( out2\$grp )

and you can even do bootstrapping or sampling with weights via:

out3 <- sample.by.group(mydf, mydf\$grp, k=c(20, 20, 30, 30), replace=T)
table( out3\$grp )

On Fri, 2006-03-17 at 16:01 +0000, Dan Bolser wrote:
> Hi,
>
> I have tuples of data in rows of a data.frame, each column is a variable
> for the 'items' (one per row).
>
> One of the variables is the 'size' of the item (row).
>
> I would like to cut my data.frame into groups such that each group has
> the same *total size*. So, assuming that we order by size, some groups
> should have several small items while other groups have a few large
> items. All the groups should have approximately the same total size.
>
> I have tried various combinations of cut, quantile, and ecdf, and I just
> can't work out how to do this!
>
> Any help is greatly appreciated!
>
> All the best,
> Dan.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help