[R] which rows are duplicates?
Bill.Venables at csiro.au
Bill.Venables at csiro.au
Mon Mar 30 06:27:16 CEST 2009
If you sort the data then the duplicated entries will occur in consecutive blocks:
> m
x y z
1 1 2 3
2 3 4 4
3 1 2 3
> m1 <- m[do.call(order, m), ]
> m1
x y z
1 1 2 3
3 1 2 3
2 3 4 4
> duplicated(m1)
[1] FALSE TRUE FALSE
>
When you identify the blocks, the row names will tell you where they occur in the original data frame.
Bill Venables
http://www.cmis.csiro.au/bill.venables/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Aaron M. Swoboda
Sent: Monday, 30 March 2009 2:07 PM
To: r-help at r-project.org
Subject: [R] which rows are duplicates?
I would like to know which rows are duplicates of each other, not
simply that a row is duplicate of another row. In the following
example rows 1 and 3 are duplicates.
> x <- c(1,3,1)
> y <- c(2,4,2)
> z <- c(3,4,3)
> data <- data.frame(x,y,z)
x y z
1 1 2 3
2 3 4 4
3 1 2 3
I can't figure out how to get R to tell me that observation 1 and 3
are the same. It seems like the "duplicated" and "unique" functions
should be able to help me out, but I am stumped.
For instance, if I use "duplicated" ...
> duplicated(data)
[1] FALSE FALSE TRUE
it tells me that row 3 is a duplicate, but not which row it matches.
How do I figure out WHICH row it matches?
And If I use "unique"...
> unique(data)
x y z
1 1 2 3
2 3 4 4
I see that rows 1 and 2 are unique, leaving me to infer that row 3 was
a duplicate, but again it doesn't tell me which row it was a duplicate
of (as far as I can tell). Am I missing something?
How can I determine that row 3 is a duplicate OF ROW 1?
Thanks,
Aaron
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list