[R] To detect the location of duplicate values
Charles Berry
cberry at tajo.ucsd.edu
Mon Jul 5 23:10:39 CEST 2010
Charles C. Berry <cberry <at> tajo.ucsd.edu> writes:
>
> On Mon, 5 Jul 2010, Moohwan Kim wrote:
>
> > Dear R family,
> >
> > I have a question about how to detect some duplicate numeric observations.
> > Suppose that I have two variables dataset.
> >
> > order value
> > 1 0.52
> > 2 0.23
> > 3 0.43
> > 4 0.21
> > 5 0.32
> > 6 0.32
> > 7 0.32
> > 8 0.32
> > 9 0.32
> > 10 0.12
> > 11 0.46
> > 12 0.09
> > 13 0.32
> > 14 0.25
> > ;
> > Could you help me indicate where the duplicate observations in a row
> > (e.g., 0.32) are?
>
> I see you already have replies about duplicate() and unique(), which are
> very handy for the 'detect' part of your query.
>
> But to list the locations of the duplciated elements, you might also
> benefit from using split() and Filter() like this:
>
> > Filter( function(x) length(x)>1, split(order, value) )
> $`0.32`
> [1] 5 6 7 8 9 13
>
Mark Leeds kindly pointed out (in private correspondence) that this needs a bit
more explanation. If the above 'dataset' is in fact a data.frame called 'dat'
then either
attach(dat)
Filter( function(x) length(x)>1, split(order, value) )
or
Filter( function(x) length(x)>1, split(dat$order, dat$value) )
or
with( dat, Filter( function(x) length(x)>1, split(order, value) ) )
should do it.
Thanks Mark!
[snip]
More information about the R-help
mailing list