[R] which rows are duplicates?

Mon Mar 30 12:51:29 CEST 2009

At 05:07 30/03/2009, Aaron M. Swoboda wrote:
>I would like to know which rows are duplicates of each other, not
>simply that a row is duplicate of another row. In the following
>example rows 1 and 3 are duplicates.
>
> > x <- c(1,3,1)
> > y <- c(2,4,2)
> > z <- c(3,4,3)
> > data <- data.frame(x,y,z)
>     x y z
>1 1 2 3
>2 3 4 4
>3 1 2 3

Does this do what you want?
 > x <- c(1,3,1)
 > y <- c(2,4,2)
 > z <- c(3,4,3)
 > data <- data.frame(x,y,z)
 > data.u <- unique(data)
 > data.u
   x y z
1 1 2 3
2 3 4 4
 > data.u <- cbind(data.u, set = 1:nrow(data.u))
 > merge(data, data.u)
   x y z set
1 1 2 3   1
2 1 2 3   1
3 3 4 4   2

You need to do a bit more work to get them back into the original row 
order if that is essential.

>I can't figure out how to get R to tell me that observation 1 and 3
>are the same.  It seems like the "duplicated" and "unique" functions
>should be able to help me out, but I am stumped.
>
>For instance, if I use "duplicated" ...
>
> > duplicated(data)
>[1] FALSE FALSE TRUE
>
>it tells me that row 3 is a duplicate, but not which row it matches.
>How do I figure out WHICH row it matches?
>
>And If I use "unique"...
>
> > unique(data)
>     x y z
>1 1 2 3
>2 3 4 4
>
>I see that rows 1 and 2 are unique, leaving me to infer that row 3 was
>a duplicate, but again it doesn't tell me which row it was a duplicate
>of (as far as I can tell). Am I missing something?
>
>How can I determine that row 3 is a duplicate OF ROW 1?
>
>Thanks,
>
>Aaron
>
>

Michael Dewey
http://www.aghmed.fsnet.co.uk