[R] finding row duplicates, regardless of element order

jim holtman jholtman at gmail.com
Mon Oct 31 01:47:09 CET 2011


try this:

>  M <- matrix(c("1","2","3","2","4","5","5","3","2","1","3","2","4","4"), ncol=2)
>  M
     [,1] [,2]
[1,] "1"  "3"
[2,] "2"  "2"
[3,] "3"  "1"
[4,] "2"  "3"
[5,] "4"  "2"
[6,] "5"  "4"
[7,] "5"  "4"
>  # not the most efficient
>  M.sorted <- apply(M, 1, function(x) paste(sort(x), collapse = ','))
>  M.sorted
[1] "1,3" "2,2" "1,3" "2,3" "2,4" "4,5" "4,5"
> # remove duplicated entries
> M[!duplicated(M.sorted), ]
     [,1] [,2]
[1,] "1"  "3"
[2,] "2"  "2"
[3,] "2"  "3"
[4,] "4"  "2"
[5,] "5"  "4"


On Sun, Oct 30, 2011 at 7:49 PM, Wet Bell Diver <wetbelldiver at gmail.com> wrote:
>
> Dear list,
>
> Suppose I have the following matrix:
>> M <- matrix(c("1","2","3","2","4","5","5","3","2","1","3","2","4","4"),
>> ncol=2)
>> M
>     [,1] [,2]
> [1,] "1"  "3"
> [2,] "2"  "2"
> [3,] "3"  "1"
> [4,] "2"  "3"
> [5,] "4"  "2"
> [6,] "5"  "4"
> [7,] "5"  "4"
>
> In this matrix, row 1 contains elements "1" and "3" and row 3 does the same.
> Similarly, rows 6 and 7 contain the same elements. I am looking for a way to
> efficiently identify these rows. I cannot use duplicated(M), since the order
> of the names in the rows does not matter, all that matters is that *all*
> names in a row also *all* appear in another row.
> How can I identify such "duplicated" rows, without going through a process
> of looping and shifting elements around?
>
> thanks, Peter
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list