[R] Binning question (binning rows of a data.frame according to a variable)

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Sun Mar 19 06:37:22 CET 2006


Do you by any chance want to sample from each group equally to get an
equal representation matrix ? Here is an example of the input :

 mydf <- data.frame( value=1:100, value2=rnorm(100),
                     grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) )

which has 35 observations from A, 15 from B, 30 from C and 20 from D.


And here is a function that I wrote:

 sample.by.group <- function(df, grp, k, replace=FALSE){

   if(length(k)==1){ k <- rep(k, length(unique(grp))) }
    
   if(!replace && any(k > table(grp)))
     stop( paste("Cannot take a sample larger than the population when
     'replace = FALSE'.\n", "Please specify a value greater than",
     min(table(grp)), "or use 'replace = TRUE'.\n") )

  
   ind   <- model.matrix( ~ -1 + grp )
   w.mat <- list(NULL)
   
   for(i in 1:ncol(ind)){
     w.mat[[i]] <- sample( which( ind[,i]==1 ), k[i], replace=replace )
   }
  
   out <- df[ unlist(w.mat), ]
   return(out)
 }


And here are some examples of how to use it :
 
mydf <- mydf[ sample(1:nrow(mydf)), ]   # scramble it for fun


out1 <- sample.by.group(mydf, mydf$grp, k=10 )
table( out1$grp )

 out2 <- sample.by.group(mydf, mydf$grp, k=50, replace=T) # ie bootstrap
 table( out2$grp )

and you can even do bootstrapping or sampling with weights via:

 out3 <- sample.by.group(mydf, mydf$grp, k=c(20, 20, 30, 30), replace=T)
 table( out3$grp )


Regards, Adai



On Fri, 2006-03-17 at 16:01 +0000, Dan Bolser wrote:
> Hi,
> 
> I have tuples of data in rows of a data.frame, each column is a variable 
> for the 'items' (one per row).
> 
> One of the variables is the 'size' of the item (row).
> 
> I would like to cut my data.frame into groups such that each group has 
> the same *total size*. So, assuming that we order by size, some groups 
> should have several small items while other groups have a few large 
> items. All the groups should have approximately the same total size.
> 
> I have tried various combinations of cut, quantile, and ecdf, and I just 
> can't work out how to do this!
> 
> Any help is greatly appreciated!
> 
> All the best,
> Dan.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list