[R] drop rare factors
Sarah Goslee
sarah.goslee at gmail.com
Thu Jan 19 20:46:33 CET 2012
Hi Sam,
To be of any use whatsoever, we need a reproducible example.
What's frame?
What's column?
What's threshold?
Remind the list what you're trying to do. The list gets lots of traffic;
if you delete out all the context nobody will remember what you need.
Sarah
On Thu, Jan 19, 2012 at 2:44 PM, Sam Steingold <sds at gnu.org> wrote:
>> * Sarah Goslee <fnenu.tbfyrr at tznvy.pbz> [2012-01-18 17:36:16 -0500]:
>>
>> Here's one way, worked out in lots of steps so you can see
>> how each works:
>
> thanks, it all makes perfect sense, and I wrote this function based on
> your instructions:
>
> drop.levels <- function (frame, column, threshold) {
> size <- nrow(frame)
> if (threshold < 1) threshold <- threshold * size
> tab <- table(frame[column])
> keep <- names(tab)[tab > threshold]
> drop <- names(tab)[tab <= threshold]
> cat("Keep(",column,")",length(keep)); print(tab[keep])
> cat("Drop(",column,")",length(drop)); print(tab[drop])
> frame1 <- frame[frame[column] %in% keep, ]
> size1 <- nrow(frame1)
> cat("Rows:",size,"-->",size1,"(dropped",100*(size-size1)/size,"%)\n")
> frame1[column] <- factor(frame1[column], levels=keep)
> frame1
> }
>
> alas, I get an error:
>
> Rows: 87392 --> 0 (dropped 100 %)
> Error in `[<-.data.frame`(`*tmp*`, column, value = NA_integer_) :
> replacement has 1 rows, data has 0
>
> when I do everything step-by-step interactively it works...
>
> Thanks!
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list