[R] Binning question (binning rows of a data.frame according to a variable)
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Sun Mar 19 06:37:22 CET 2006
Do you by any chance want to sample from each group equally to get an
equal representation matrix ? Here is an example of the input :
mydf <- data.frame( value=1:100, value2=rnorm(100),
grp=rep( LETTERS[1:4], c(35, 15, 30, 20) ) )
which has 35 observations from A, 15 from B, 30 from C and 20 from D.
And here is a function that I wrote:
sample.by.group <- function(df, grp, k, replace=FALSE){
if(length(k)==1){ k <- rep(k, length(unique(grp))) }
if(!replace && any(k > table(grp)))
stop( paste("Cannot take a sample larger than the population when
'replace = FALSE'.\n", "Please specify a value greater than",
min(table(grp)), "or use 'replace = TRUE'.\n") )
ind <- model.matrix( ~ -1 + grp )
w.mat <- list(NULL)
for(i in 1:ncol(ind)){
w.mat[[i]] <- sample( which( ind[,i]==1 ), k[i], replace=replace )
}
out <- df[ unlist(w.mat), ]
return(out)
}
And here are some examples of how to use it :
mydf <- mydf[ sample(1:nrow(mydf)), ] # scramble it for fun
out1 <- sample.by.group(mydf, mydf$grp, k=10 )
table( out1$grp )
out2 <- sample.by.group(mydf, mydf$grp, k=50, replace=T) # ie bootstrap
table( out2$grp )
and you can even do bootstrapping or sampling with weights via:
out3 <- sample.by.group(mydf, mydf$grp, k=c(20, 20, 30, 30), replace=T)
table( out3$grp )
Regards, Adai
On Fri, 2006-03-17 at 16:01 +0000, Dan Bolser wrote:
> Hi,
>
> I have tuples of data in rows of a data.frame, each column is a variable
> for the 'items' (one per row).
>
> One of the variables is the 'size' of the item (row).
>
> I would like to cut my data.frame into groups such that each group has
> the same *total size*. So, assuming that we order by size, some groups
> should have several small items while other groups have a few large
> items. All the groups should have approximately the same total size.
>
> I have tried various combinations of cut, quantile, and ecdf, and I just
> can't work out how to do this!
>
> Any help is greatly appreciated!
>
> All the best,
> Dan.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list