[R] drop rare factors

Sarah Goslee sarah.goslee at gmail.com
Thu Jan 19 20:46:33 CET 2012


Hi Sam,

To be of any use whatsoever, we need a reproducible example.

What's frame?
What's column?
What's threshold?

Remind the list what you're trying to do. The list gets lots of traffic;
if you delete out all the context nobody will remember what you need.

Sarah

On Thu, Jan 19, 2012 at 2:44 PM, Sam Steingold <sds at gnu.org> wrote:
>> * Sarah Goslee <fnenu.tbfyrr at tznvy.pbz> [2012-01-18 17:36:16 -0500]:
>>
>> Here's one way, worked out in lots of steps so you can see
>> how each works:
>
> thanks, it all makes perfect sense, and I wrote this function based on
> your instructions:
>
> drop.levels <- function (frame, column, threshold) {
>  size <- nrow(frame)
>  if (threshold < 1) threshold <- threshold * size
>  tab <- table(frame[column])
>  keep <- names(tab)[tab >  threshold]
>  drop <- names(tab)[tab <= threshold]
>  cat("Keep(",column,")",length(keep)); print(tab[keep])
>  cat("Drop(",column,")",length(drop)); print(tab[drop])
>  frame1 <- frame[frame[column] %in% keep, ]
>  size1 <- nrow(frame1)
>  cat("Rows:",size,"-->",size1,"(dropped",100*(size-size1)/size,"%)\n")
>  frame1[column] <- factor(frame1[column], levels=keep)
>  frame1
> }
>
> alas, I get an error:
>
> Rows: 87392 --> 0 (dropped 100 %)
> Error in `[<-.data.frame`(`*tmp*`, column, value = NA_integer_) :
>  replacement has 1 rows, data has 0
>
> when I do everything step-by-step interactively it works...
>
> Thanks!
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list