[Rd] duplicates() function
Hadley Wickham
hadley at rice.edu
Fri Apr 8 17:13:26 CEST 2011
On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> I need a function which is similar to duplicated(), but instead of returning
> TRUE/FALSE, returns indices of which element was duplicated. That is,
>
>> x <- c(9,7,9,3,7)
>> duplicated(x)
> [1] FALSE FALSE TRUE FALSE TRUE
>
>> duplicates(x)
> [1] NA NA 1 NA 2
>
> (so that I know that element 3 is a duplicate of element 1, and element 5 is
> a duplicate of element 2, whereas the others were not duplicated according
> to our definition.)
>
> Is there a simple way to write this function? I have an ugly
> implementation in R that loops over all the values; it would make more sense
> to redo it in C, if there isn't a simple implementation I missed.
I'd think of making it a lookup table. The basic idea is
split(seq_along(x), x)
but there are probably much faster ways of doing it, depending on what
you need. But for efficiency, you probably need a hashtable
somewhere.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
More information about the R-devel
mailing list