[R] which element is duplicated?

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Tue Nov 13 10:08:12 CET 2018


>>>>> PIKAL Petr 
>>>>>     on Tue, 13 Nov 2018 08:42:22 +0000 writes:

    > Hi
    > similar result (with different numerical values) could
    > be achieved by making v a factor.

> > v <- letters[c(2,2,1,2,1,1)]
> > vf<-factor(v)
> > as.numeric(vf)
> [1] 2 2 1 2 1 1
> 
> Cheers
> Petr

Yes, as was already remarked by Michael Sumner.

But really the power is in  match() :  It is called at *twice* by factor().

Martin

> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert Gunter
> > Sent: Tuesday, November 13, 2018 6:44 AM
> > To: Duncan Murdoch <murdoch.duncan using gmail.com>
> > Cc: R-help <R-help using r-project.org>
> > Subject: Re: [R] which element is duplicated?
> >
> > It is not clear to what you want for the general case. Perhaps:
> >
> > > v <- letters[c(2,2,1,2,1,1)]
> > > wh <- tapply(seq_along(v),factor(v), '[',1) w <- wh[match(v,v[wh])] w
> > b b a b a a
> > 1 1 3 1 3 3
> > > ## and if you want NA's for the first occurences of unique values ##
> > > of course:
> > > w[wh] <- NA
> > > w
> >  b  b  a  b  a  a
> > NA  1 NA  1  3  3
> >
> > I'd like to see a cleverer solution that vectorizes and avoids the tapply(),
> > though.
> >
> > Cheers,
> > Bert
> >
> >
> >
> >
> > On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <bgunter.4567 using gmail.com>
> > wrote:
> >
> > > > match(v, unique(v))
> > > [1] 1 2 2 1
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along
> > > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch
> > > <murdoch.duncan using gmail.com>
> > > wrote:
> > >
> > >> The duplicated() function gives TRUE if an item in a vector (or row
> > >> in a matrix, etc.) is a duplicate of an earlier item.  But what I
> > >> would like to know is which item does it duplicate?
> > >>
> > >> For example,
> > >>
> > >> v <- c("a", "b", "b", "a")
> > >> duplicated(v)
> > >>
> > >> returns
> > >>
> > >> [1] FALSE FALSE  TRUE  TRUE
> > >>
> > >> What I want is a fast way to calculate
> > >>
> > >>   [1] NA NA 2 1
> > >>
> > >> or (equally useful to me)
> > >>
> > >>   [1] 1 2 2 1
> > >>
> > >> The result should have the property that if result[i] == j, then v[i]
> > >> == v[j], at least for i != j.
> > >>
> > >> Does this already exist somewhere, or is it easy to write?
> > >>
> > >> Duncan Murdoch
> > >>
> > >> ______________________________________________
> > >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list