[R] To detect the location of duplicate values

Charles Berry cberry at tajo.ucsd.edu
Mon Jul 5 23:10:39 CEST 2010


Charles C. Berry <cberry <at> tajo.ucsd.edu> writes:

> 
> On Mon, 5 Jul 2010, Moohwan Kim wrote:
> 
> > Dear R family,
> >
> > I have a question about how to detect some duplicate numeric observations.
> > Suppose that I have two variables dataset.
> >
> > order value
> > 1  0.52
> > 2  0.23
> > 3  0.43
> > 4  0.21
> > 5  0.32
> > 6  0.32
> > 7  0.32
> > 8  0.32
> > 9  0.32
> > 10 0.12
> > 11 0.46
> > 12 0.09
> > 13 0.32
> > 14 0.25
> > ;
> > Could you help me indicate where the duplicate observations in a row
> > (e.g., 0.32) are?
> 
> I see you already have replies about duplicate() and unique(), which are 
> very handy for the 'detect' part of your query.
> 
> But to list the locations of the duplciated elements, you might also 
> benefit from using split() and Filter() like this:
> 
> > Filter( function(x) length(x)>1, split(order, value) )
> $`0.32`
> [1]  5  6  7  8  9 13
> 

Mark Leeds kindly pointed out (in private correspondence) that this needs a bit
more explanation. If the above 'dataset' is in fact a data.frame called 'dat'

then either

attach(dat) 
Filter( function(x) length(x)>1, split(order, value) )

or

Filter( function(x) length(x)>1, split(dat$order, dat$value) )

or 

with( dat, Filter( function(x) length(x)>1, split(order, value) ) )

should do it.

Thanks Mark!



[snip]



More information about the R-help mailing list