[Rd] setequal: better readability, reduced memory footprint, and minor speedup

peter dalgaard pdalgd at gmail.com
Thu Jan 8 22:30:47 CET 2015


If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern.

The && idea does have some merit, though. 

Apropos, why is there no setcontains()?

-pd

> On 06 Jan 2015, at 22:02 , Hervé Pagès <hpages at fredhutch.org> wrote:
> 
> Hi,
> 
> Current implementation:
> 
> setequal <- function (x, y)
> {
>  x <- as.vector(x)
>  y <- as.vector(y)
>  all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
> }
> 
> First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > 0L'
> with 'x %in% y' and 'y %in% x', respectively. They're strictly
> equivalent but the latter form is a lot more readable than the former
> (isn't this the "raison d'être" of %in%?):
> 
> setequal <- function (x, y)
> {
>  x <- as.vector(x)
>  y <- as.vector(y)
>  all(c(x %in% y, y %in% x))
> }
> 
> Furthermore, replacing 'all(c(x %in% y, y %in x))' with
> 'all(x %in% y) && all(y %in% x)' improves readability even more and,
> more importantly, reduces memory footprint significantly on big vectors
> (e.g. by 15% on integer vectors with 15M elements):
> 
> setequal <- function (x, y)
> {
>  x <- as.vector(x)
>  y <- as.vector(y)
>  all(x %in% y) && all(y %in% x)
> }
> 
> It also seems to speed up things a little bit (not in a significant
> way though).
> 
> Cheers,
> H.
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list