[R] drop rare factors
Sam Steingold
sds at gnu.org
Thu Jan 19 20:44:40 CET 2012
> * Sarah Goslee <fnenu.tbfyrr at tznvy.pbz> [2012-01-18 17:36:16 -0500]:
>
> Here's one way, worked out in lots of steps so you can see
> how each works:
thanks, it all makes perfect sense, and I wrote this function based on
your instructions:
drop.levels <- function (frame, column, threshold) {
size <- nrow(frame)
if (threshold < 1) threshold <- threshold * size
tab <- table(frame[column])
keep <- names(tab)[tab > threshold]
drop <- names(tab)[tab <= threshold]
cat("Keep(",column,")",length(keep)); print(tab[keep])
cat("Drop(",column,")",length(drop)); print(tab[drop])
frame1 <- frame[frame[column] %in% keep, ]
size1 <- nrow(frame1)
cat("Rows:",size,"-->",size1,"(dropped",100*(size-size1)/size,"%)\n")
frame1[column] <- factor(frame1[column], levels=keep)
frame1
}
alas, I get an error:
Rows: 87392 --> 0 (dropped 100 %)
Error in `[<-.data.frame`(`*tmp*`, column, value = NA_integer_) :
replacement has 1 rows, data has 0
when I do everything step-by-step interactively it works...
Thanks!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://ffii.org http://www.PetitionOnline.com/tap12009/ http://camera.org
http://palestinefacts.org http://jihadwatch.org http://pmw.org.il
Your mouse has moved - WinNT has to be restarted for this to take effect.
More information about the R-help
mailing list