[R] removing factor level represented by less than x rows

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Fri Jul 8 16:39:31 CEST 2005



Mikkel Grum wrote:
> In a number of different situations I'm trying to
> remove factor levels that are represented by less than
> a certain number of rows, e.g. if I had the dataset aa
> below and wanted to remove the species that are
> represented in less than 2 rows:
> 
> data(iris)
> aa <- iris[1:101,]
> 
> In this case, since I can see that the species
> virginica only has one row, I can write:
> 
> table(aa$Species)
> setosa versicolor  virginica 
>         50         50          1 
> 
> aa[aa$Species != "virginica", ]
> 
> but:
> 
> aa[aa$Species == names(table(aa$Species)> 2),]
> 
> does not work.
> 

If you take a look at "table(aa$Species) > 2" you'll see your first 
mistake. Namely, the names are all still present. Your second mistake is 
to use "==" to match two names. "==" does not work like that. What you 
want is "%in%" instead.

I think you want the following:

keep <- levels(aa$Species)[table(aa$Species) > 2]
aa <- aa[aa$Species %in% keep, ]

However, the level for "virginica" is still present in the Species 
variable. If you would like to drop this completely, then try

aa$Species <- aa$Species[drop = TRUE]

HTH,

--sundar




More information about the R-help mailing list