[R] removing fields of the same group from a data frame

Marc Schwartz marc_schwartz at me.com
Wed Mar 24 15:01:13 CET 2010


On Mar 24, 2010, at 8:38 AM, Oscar Franzén wrote:

> Dear all,
> 
> I'm trying to find a a way to remove certain fields belonging to the same
> group from a data frame structure.
> 
> I have a data frame like this:
> 
> foo v1 v2 v3
>       1  1  a
>       6  2  a
>       3  8  a
>       4  4  b
>       4  4  b
>       2  1  c
>       1  6  d
> 
> Each row can then be grouped according to the third column: a, b, c, d. Then
> I would like to remove all fields that belong to a group with less than X
> members, for example less than 3 members, then
> the resulting data frame structure would look like:
> 
> 
> foo v1 v2 v3
>       1  1   a
>       6  2   a
>       3  8   a
> 
> Is there some simple way to do this in R?
> 
> Thanks in advance.
> /Oscar

> DF
  v1 v2 v3
1  1  1  a
2  6  2  a
3  3  8  a
4  4  4  b
5  4  4  b
6  2  1  c
7  1  6  d

> subset(DF, !v3 %in% names(which(table(v3) < 3)))
  v1 v2 v3
1  1  1  a
2  6  2  a
3  3  8  a


The use of table() gets us:

> table(DF$v3) < 3

    a     b     c     d 
FALSE  TRUE  TRUE  TRUE 

followed by:

> names(which(table(DF$v3) < 3))
[1] "b" "c" "d"

which gives us the values of v3 that don't have at least 3 entries.

When using subset(), the variables are evaluated first within the data frame, hence we can drop the 'DF$' in the function call. The use of "%in%" in subset() allows us to include or exclude certain values from a set comparison.

We could also reverse the logic, yielding the same result:

> subset(DF, v3 %in% names(which(table(v3) >= 3)))
  v1 v2 v3
1  1  1  a
2  6  2  a
3  3  8  a


See ?table, ?subset and ?"%in%" for more information.

HTH,

Marc Schwartz



More information about the R-help mailing list