[Rd] setequal: better readability, reduced memory footprint, and minor speedup

Peter Haverty haverty.peter at gene.com
Thu Jan 8 23:06:46 CET 2015


How about unique them both and compare the lengths?  It's less work,
especially allocation.



Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard <pdalgd at gmail.com> wrote:

> If you look at the definition of %in%, you'll find that it is implemented
> using match, so if we did as you suggest, I give it about three days before
> someone suggests to inline the function call... Readability of source code
> is not usually our prime concern.
>
> The && idea does have some merit, though.
>
> Apropos, why is there no setcontains()?
>
> -pd
>
> > On 06 Jan 2015, at 22:02 , Hervé Pagès <hpages at fredhutch.org> wrote:
> >
> > Hi,
> >
> > Current implementation:
> >
> > setequal <- function (x, y)
> > {
> >  x <- as.vector(x)
> >  y <- as.vector(y)
> >  all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
> > }
> >
> > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) >
> 0L'
> > with 'x %in% y' and 'y %in% x', respectively. They're strictly
> > equivalent but the latter form is a lot more readable than the former
> > (isn't this the "raison d'être" of %in%?):
> >
> > setequal <- function (x, y)
> > {
> >  x <- as.vector(x)
> >  y <- as.vector(y)
> >  all(c(x %in% y, y %in% x))
> > }
> >
> > Furthermore, replacing 'all(c(x %in% y, y %in x))' with
> > 'all(x %in% y) && all(y %in% x)' improves readability even more and,
> > more importantly, reduces memory footprint significantly on big vectors
> > (e.g. by 15% on integer vectors with 15M elements):
> >
> > setequal <- function (x, y)
> > {
> >  x <- as.vector(x)
> >  y <- as.vector(y)
> >  all(x %in% y) && all(y %in% x)
> > }
> >
> > It also seems to speed up things a little bit (not in a significant
> > way though).
> >
> > Cheers,
> > H.
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fredhutch.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list