[Rd] setequal: better readability, reduced memory footprint, and minor speedup

Hervé Pagès hpages at fredhutch.org
Tue Jan 6 22:02:31 CET 2015


Hi,

Current implementation:

   setequal <- function (x, y)
   {
     x <- as.vector(x)
     y <- as.vector(y)
     all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
   }

First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > 0L'
with 'x %in% y' and 'y %in% x', respectively. They're strictly
equivalent but the latter form is a lot more readable than the former
(isn't this the "raison d'être" of %in%?):

   setequal <- function (x, y)
   {
     x <- as.vector(x)
     y <- as.vector(y)
     all(c(x %in% y, y %in% x))
   }

Furthermore, replacing 'all(c(x %in% y, y %in x))' with
'all(x %in% y) && all(y %in% x)' improves readability even more and,
more importantly, reduces memory footprint significantly on big vectors
(e.g. by 15% on integer vectors with 15M elements):

   setequal <- function (x, y)
   {
     x <- as.vector(x)
     y <- as.vector(y)
     all(x %in% y) && all(y %in% x)
   }

It also seems to speed up things a little bit (not in a significant
way though).

Cheers,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list