[R] removing fields of the same group from a data frame
Marc Schwartz
marc_schwartz at me.com
Wed Mar 24 15:01:13 CET 2010
On Mar 24, 2010, at 8:38 AM, Oscar Franzén wrote:
> Dear all,
>
> I'm trying to find a a way to remove certain fields belonging to the same
> group from a data frame structure.
>
> I have a data frame like this:
>
> foo v1 v2 v3
> 1 1 a
> 6 2 a
> 3 8 a
> 4 4 b
> 4 4 b
> 2 1 c
> 1 6 d
>
> Each row can then be grouped according to the third column: a, b, c, d. Then
> I would like to remove all fields that belong to a group with less than X
> members, for example less than 3 members, then
> the resulting data frame structure would look like:
>
>
> foo v1 v2 v3
> 1 1 a
> 6 2 a
> 3 8 a
>
> Is there some simple way to do this in R?
>
> Thanks in advance.
> /Oscar
> DF
v1 v2 v3
1 1 1 a
2 6 2 a
3 3 8 a
4 4 4 b
5 4 4 b
6 2 1 c
7 1 6 d
> subset(DF, !v3 %in% names(which(table(v3) < 3)))
v1 v2 v3
1 1 1 a
2 6 2 a
3 3 8 a
The use of table() gets us:
> table(DF$v3) < 3
a b c d
FALSE TRUE TRUE TRUE
followed by:
> names(which(table(DF$v3) < 3))
[1] "b" "c" "d"
which gives us the values of v3 that don't have at least 3 entries.
When using subset(), the variables are evaluated first within the data frame, hence we can drop the 'DF$' in the function call. The use of "%in%" in subset() allows us to include or exclude certain values from a set comparison.
We could also reverse the logic, yielding the same result:
> subset(DF, v3 %in% names(which(table(v3) >= 3)))
v1 v2 v3
1 1 1 a
2 6 2 a
3 3 8 a
See ?table, ?subset and ?"%in%" for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list