[Rd] setequal: better readability, reduced memory footprint, and minor speedup
William Dunlap
wdunlap at tibco.com
Thu Jan 8 23:19:47 CET 2015
> why is there no setcontains()?
Several packages define is.subset(), which I am assuming is what you are
proposing, but it its arguments reversed. E.g., package:algstat has
is.subset <- function(x, y) all(x %in% y)
containsQ <- function(y, x) all(x %in% y)
and package:rje has essentially the same is.subset.
package:arulesSequences and package:arules have an S4 generic called
is.subset, which is entirely different (it is not a predicate, but returns
a matrix).
Bill Dunlap
On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard <pdalgd at gmail.com> wrote:
> If you look at the definition of %in%, you'll find that it is implemented
> using match, so if we did as you suggest, I give it about three days before
> someone suggests to inline the function call... Readability of source code
> is not usually our prime concern.
>
> The && idea does have some merit, though.
>
> Apropos, why is there no setcontains()?
>
> -pd
>
> > On 06 Jan 2015, at 22:02 , Hervé Pagès <hpages at fredhutch.org> wrote:
> >
> > Hi,
> >
> > Current implementation:
> >
> > setequal <- function (x, y)
> > {
> > x <- as.vector(x)
> > y <- as.vector(y)
> > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
> > }
> >
> > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) >
> 0L'
> > with 'x %in% y' and 'y %in% x', respectively. They're strictly
> > equivalent but the latter form is a lot more readable than the former
> > (isn't this the "raison d'être" of %in%?):
> >
> > setequal <- function (x, y)
> > {
> > x <- as.vector(x)
> > y <- as.vector(y)
> > all(c(x %in% y, y %in% x))
> > }
> >
> > Furthermore, replacing 'all(c(x %in% y, y %in x))' with
> > 'all(x %in% y) && all(y %in% x)' improves readability even more and,
> > more importantly, reduces memory footprint significantly on big vectors
> > (e.g. by 15% on integer vectors with 15M elements):
> >
> > setequal <- function (x, y)
> > {
> > x <- as.vector(x)
> > y <- as.vector(y)
> > all(x %in% y) && all(y %in% x)
> > }
> >
> > It also seems to speed up things a little bit (not in a significant
> > way though).
> >
> > Cheers,
> > H.
> >
> >
>
>
>
