[R] which element is duplicated?

Duncan Murdoch murdoch@dunc@n @ending from gm@il@com
Tue Nov 13 21:25:06 CET 2018


On 13/11/2018 12:58 PM, William Dunlap wrote:
> You also asked about doing this for the rows of a matrix.  unique() give
> the unique rows but match operates on a per element, not per row,
> basis.  You can use split, which operates on rows of a matrix, to help.
> 
>      > m <- cbind( A=c(i=5,ii=5,iii=5,iv=4,v=4,vi=4), B=c(2,3,2,2,2,2) )
>      > unique(m)
>         A B
>     i  5 2
>     ii 5 3
>     iv 4 2
>      > match(m, unique(m)) # bad
>       [1] 1 1 1 3 3 3 4 5 4 4 4 4
>      > asRows <- function(x) split(x, seq_len(NROW(x))) # convert to
>     list of rows
>      > match(asRows(m), unique(asRows(m)))
>     [1] 1 2 1 3 3 3
> 
> 
> For data.frames unique works on rows but match works on columns, and 
> converting
> to a list of rows does not quite work, because unique looks at the row 
> names.  A
> modification of asRoiws works around that:
> 
>      > d <- data.frame(m)
>      > unique(d)
>         A B
>     i  5 2
>     ii 5 3
>     iv 4 2
>      > match(d, unique(d))
>     [1] NA NA
>      > asRows <- function(x) lapply(split(x, seq_len(NROW(x))), as.list)
>      > match(asRows(d), unique(asRows(d)))
>     [1] 1 2 1 3 3 3
> 

Thanks!  That's very nice.

> 
> Is this the sort of issue that Hadley's vectors package is addressing?
I don't know; hopefully someone else will respond...

Duncan Murdoch

> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
> 
> On Tue, Nov 13, 2018 at 2:15 AM, Duncan Murdoch 
> <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>> wrote:
> 
>     On 13/11/2018 12:35 AM, Pages, Herve wrote:
> 
>         Hi,
> 
>         On 11/12/18 17:08, Duncan Murdoch wrote:
> 
>             The duplicated() function gives TRUE if an item in a vector
>             (or row in
>             a matrix, etc.) is a duplicate of an earlier item.  But what
>             I would
>             like to know is which item does it duplicate?
> 
>             For example,
> 
>             v <- c("a", "b", "b", "a")
>             duplicated(v)
> 
>             returns
> 
>             [1] FALSE FALSE  TRUE  TRUE
> 
>             What I want is a fast way to calculate
> 
>                [1] NA NA 2 1
> 
>             or (equally useful to me)
> 
>                [1] 1 2 2 1
> 
>             The result should have the property that if result[i] == j,
>             then v[i]
>             == v[j], at least for i != j.
> 
>             Does this already exist somewhere, or is it easy to write?
> 
> 
>         I generally use match() for that:
> 
>            > v <- c("a", "b", "b", "a")
> 
>            > match(v, v)
> 
>         [1] 1 2 2 1
> 
> 
>     Yes, this is perfect.  Thanks to you (and the private answer I
>     received that suggested the same).
> 
>     Duncan Murdoch
> 
>     ______________________________________________
>     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list