[R] Remove single entries
Barry Rowlingson
b.rowlingson at lancaster.ac.uk
Mon Sep 28 18:32:19 CEST 2009
On Mon, Sep 28, 2009 at 5:03 PM, Raymond Danner <rdanner at vt.edu> wrote:
> Dear Community,
>
> I have a data set with two columns, bird number and mass. Individual birds
> were captured 1-13 times and weighed each time. I would like to remove
> those individuals that were captured only once, so that I can assess mass
> variability per bird. Iąve tried many approaches with no success. Can
> anyone recommend a way to remove individuals that were captured only once?
Approach this one step at a time. My sample data is:
> wts
bird mass
1 1 2.3
2 1 3.2
3 1 2.1
4 2 1.2
5 3 5.4
6 3 4.5
7 3 4.4
8 4 3.2
how many times was each bird measured? Use table()
> table(wts$bird)
1 2 3 4
3 1 3 1
table uses the row.names() function to get the row names of the
original dataframe, so we want the row names where the count is
greater than one:
> row.names(table(wts$bird))[table(wts$bird)>1]
[1] "1" "3"
[This calls 'table' twice, so you might want to save the table to a new object]
Now we want all the rows of our original dataframe where the bird
number is in that set, so we select rows using %in%:
> wts[wts$bird %in% row.names(table(wts$bird))[table(wts$bird)>1],]
bird mass
1 1 2.3
2 1 3.2
3 1 2.1
5 3 5.4
6 3 4.5
7 3 4.4
Looks a bit messy, I'm not pleased with myself... Must be a better way...
Aha! A table-free way of computing the bird counts is:
> unique(wts$bird[duplicated(wts$bird)])
[1] 1 3
So you could do:
> wts[wts$bird %in% unique(wts$bird[duplicated(wts$bird)]),]
bird mass
1 1 2.3
2 1 3.2
3 1 2.1
5 3 5.4
6 3 4.5
7 3 4.4
which looks a bit neater! You might want to unravel
unique(wts$bird[duplicated(wts$bird)]) to see what the various bits
do. And read the help pages.
TMTOWTDI, as they say.
Barry
More information about the R-help
mailing list