[R] Remove single entries

Mon Sep 28 18:32:19 CEST 2009

On Mon, Sep 28, 2009 at 5:03 PM, Raymond Danner <rdanner at vt.edu> wrote:
> Dear Community,
>
> I have a data set with two columns, bird number and mass.  Individual birds
> were captured 1-13 times and weighed each time.  I would like to remove
> those individuals that were captured only once, so that I can assess mass
> variability per bird.  Iąve tried many approaches with no success.  Can
> anyone recommend a way to remove individuals that were captured only once?

 Approach this one step at a time. My sample data is:

 > wts
  bird mass
1    1  2.3
2    1  3.2
3    1  2.1
4    2  1.2
5    3  5.4
6    3  4.5
7    3  4.4
8    4  3.2

 how many times was each bird measured? Use table()

 > table(wts$bird)

1 2 3 4
3 1 3 1

  table uses the row.names() function to get the row names of the
original dataframe, so we want the row names where the count is
greater than one:

 > row.names(table(wts$bird))[table(wts$bird)>1]
[1] "1" "3"

 [This calls 'table' twice, so you might want to save the table to a new object]

Now we want all the rows of our original dataframe where the bird
number is in that set, so we select rows using %in%:

 > wts[wts$bird %in% row.names(table(wts$bird))[table(wts$bird)>1],]
  bird mass
1    1  2.3
2    1  3.2
3    1  2.1
5    3  5.4
6    3  4.5
7    3  4.4

 Looks a bit messy, I'm not pleased with myself... Must be a better way...

 Aha! A table-free way of computing the bird counts is:

 > unique(wts$bird[duplicated(wts$bird)])
[1] 1 3

 So you could do:

 > wts[wts$bird %in% unique(wts$bird[duplicated(wts$bird)]),]
  bird mass
1    1  2.3
2    1  3.2
3    1  2.1
5    3  5.4
6    3  4.5
7    3  4.4

 which looks a bit neater! You might want to unravel
unique(wts$bird[duplicated(wts$bird)]) to see what the various bits
do. And read the help pages.

TMTOWTDI, as they say.

Barry