[Rd] duplicates() function
Duncan Murdoch
murdoch.duncan at gmail.com
Mon Apr 11 20:05:11 CEST 2011
On 08/04/2011 11:39 AM, Joshua Ulrich wrote:
> On Fri, Apr 8, 2011 at 10:15 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
> > On 08/04/2011 11:08 AM, Joshua Ulrich wrote:
> >>
> >> How about:
> >>
> >> y<- rep(NA,length(x))
> >> y[duplicated(x)]<- match(x[duplicated(x)] ,x)
> >
> > That's a nice solution for vectors. Unfortunately for me, I have a matrix
> > (which duplicated() handles by checking whole rows). So a better example
> > that I should have posted would be
> >
> > x<- cbind(1, c(9,7,9,3,7) )
> >
> > and I'd still like the same output
> >
> For a matrix, could you apply the same strategy used in duplicated()?
>
> y<- rep(NA,NROW(x))
> temp<- apply(x, 1, function(x) paste(x, collapse="\r"))
> y[duplicated(temp)]<- match(temp[duplicated(temp)], temp)
Since this thread hasn't ended, I will say that I think this solution is
the best I've seen for my specific problem. I was actually surprised
that duplicated() did the string concatenation trick, but since it does,
it makes a lot of sense to do the same in duplicates().
I think a good general purpose solution that worked wherever
duplicated() works would likely be harder, because we don't really have
the right primitives to make it work.
Duncan Murdoch
> >> duplicated(x)
> >
> > [1] FALSE FALSE TRUE FALSE TRUE
> >
> >> duplicates(x)
> >
> > [1] NA NA 1 NA 2
> >
> >
> > Duncan Murdoch
> >
> >> --
> >> Joshua Ulrich | FOSS Trading: www.fosstrading.com
> >>
> >>
> >>
> >> On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch<murdoch.duncan at gmail.com>
> >> wrote:
> >> > I need a function which is similar to duplicated(), but instead of
> >> > returning
> >> > TRUE/FALSE, returns indices of which element was duplicated. That is,
> >> >
> >> >> x<- c(9,7,9,3,7)
> >> >> duplicated(x)
> >> > [1] FALSE FALSE TRUE FALSE TRUE
> >> >
> >> >> duplicates(x)
> >> > [1] NA NA 1 NA 2
> >> >
> >> > (so that I know that element 3 is a duplicate of element 1, and element
> >> > 5 is
> >> > a duplicate of element 2, whereas the others were not duplicated
> >> > according
> >> > to our definition.)
> >> >
> >> > Is there a simple way to write this function? I have an ugly
> >> > implementation in R that loops over all the values; it would make more
> >> > sense
> >> > to redo it in C, if there isn't a simple implementation I missed.
> >> >
> >> > Duncan Murdoch
> >> >
> >> > ______________________________________________
> >> > R-devel at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >
> >
More information about the R-devel
mailing list