[Rd] setequal: better readability, reduced memory footprint, and minor speedup
peter dalgaard
pdalgd at gmail.com
Thu Jan 8 22:30:47 CET 2015
If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern.
The && idea does have some merit, though.
Apropos, why is there no setcontains()?
-pd
> On 06 Jan 2015, at 22:02 , Hervé Pagès <hpages at fredhutch.org> wrote:
>
> Hi,
>
> Current implementation:
>
> setequal <- function (x, y)
> {
> x <- as.vector(x)
> y <- as.vector(y)
> all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
> }
>
> First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > 0L'
> with 'x %in% y' and 'y %in% x', respectively. They're strictly
> equivalent but the latter form is a lot more readable than the former
> (isn't this the "raison d'être" of %in%?):
>
> setequal <- function (x, y)
> {
> x <- as.vector(x)
> y <- as.vector(y)
> all(c(x %in% y, y %in% x))
> }
>
> Furthermore, replacing 'all(c(x %in% y, y %in x))' with
> 'all(x %in% y) && all(y %in% x)' improves readability even more and,
> more importantly, reduces memory footprint significantly on big vectors
> (e.g. by 15% on integer vectors with 15M elements):
>
> setequal <- function (x, y)
> {
> x <- as.vector(x)
> y <- as.vector(y)
> all(x %in% y) && all(y %in% x)
> }
>
> It also seems to speed up things a little bit (not in a significant
> way though).
>
> Cheers,
> H.
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-devel
mailing list