[Rd] duplicated() variation that goes both ways to capture all duplicates
Duncan Murdoch
murdoch.duncan at gmail.com
Mon Jul 23 15:08:22 CEST 2012
On 23/07/2012 8:49 AM, Liviu Andronic wrote:
> Dear all
> The trouble with the current duplicated() function in is that it can
> report duplicates while searching fromFirst _or_ fromLast, but not
> both ways. Often users will want to identify and extract all the
> copies of the item that has duplicates, not only the duplicates
> themselves.
>
> To take the example from the man page:
> > data(iris)
> > iris[duplicated(iris), ] ##duplicates while searching "fromFirst"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 143 5.8 2.7 5.1 1.9 virginica
> > iris[duplicated(iris, fromLast=T), ] ##duplicates while searching "fromLast"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 102 5.8 2.7 5.1 1.9 virginica
>
>
> To extract all the copies of the concerned items ("original" and
> duplicates) one would need to do something like this:
> > iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ] ##duplicates while searching "bothWays"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 102 5.8 2.7 5.1 1.9 virginica
> 143 5.8 2.7 5.1 1.9 virginica
>
>
> Unfortunately this is unnecessarily long and convoluted. Short of a
> 'bothWays' argument in duplicated(), I came up with a small wrapper
> that simplifies the above:
> duplicated2 <-
> function(x, bothWays=TRUE, ...)
> {
> if(!bothWays) {
> return(duplicated(x, ...))
> } else if(bothWays) {
> return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...)))
> }
> }
>
>
> Now the above can be achieved simply via:
> > iris[duplicated2(iris), ] ##duplicates while searching "bothWays"
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 102 5.8 2.7 5.1 1.9 virginica
> 143 5.8 2.7 5.1 1.9 virginica
>
>
> So here's my inquiry: Would the R Core consider adding such
> functionality in 'base' R? Either the---suitably cleaned
> up---duplicated2() function above, or a "bothWays" argument in
> duplicated() itself? Either of the two would improve user convenience
> and reduce confusion. (In my case it took some time before I
> understood the correct approach to this problem.)
I can't speak for all of R core, but I don't see the need for this in
base R -- your solution looks fine to me.
Duncan Murdoch
More information about the R-devel
mailing list