[Rd] setequal: better readability, reduced memory footprint, and minor speedup
Hervé Pagès
hpages at fredhutch.org
Fri Jan 9 07:21:12 CET 2015
On 01/08/2015 01:30 PM, peter dalgaard wrote:
> If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call...
But you wouldn't bet money on that right? Because you know you would
loose.
> Readability of source code is not usually our prime concern.
Don't sacrifice readability if you do not have a good reason for it.
What's your reason here? Are you seriously suggesting that inlining
makes a significant difference? As Michael pointed out, the expensive
operation here is the hashing. But sadly some people like inlining and
want to use it everywhere: it's easy and they feel good about it, even
if it hurts readability and maintainability (if you use x %in% y
instead of the inlined version, the day someone changes the
implementation of x %in% y for something faster, or fixes a bug
in it, your code will automatically benefit, right now it won't).
More simply put: good readability generally leads to better code.
>
> The && idea does have some merit, though.
>
> Apropos, why is there no setcontains()?
Wait... shouldn't everybody use all(match(x, y, nomatch = 0L) > 0L) ?
H.
>
> -pd
>
>> On 06 Jan 2015, at 22:02 , Hervé Pagès <hpages at fredhutch.org> wrote:
>>
>> Hi,
>>
>> Current implementation:
>>
>> setequal <- function (x, y)
>> {
>> x <- as.vector(x)
>> y <- as.vector(y)
>> all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
>> }
>>
>> First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > 0L'
>> with 'x %in% y' and 'y %in% x', respectively. They're strictly
>> equivalent but the latter form is a lot more readable than the former
>> (isn't this the "raison d'être" of %in%?):
>>
>> setequal <- function (x, y)
>> {
>> x <- as.vector(x)
>> y <- as.vector(y)
>> all(c(x %in% y, y %in% x))
>> }
>>
>> Furthermore, replacing 'all(c(x %in% y, y %in x))' with
>> 'all(x %in% y) && all(y %in% x)' improves readability even more and,
>> more importantly, reduces memory footprint significantly on big vectors
>> (e.g. by 15% on integer vectors with 15M elements):
>>
>> setequal <- function (x, y)
>> {
>> x <- as.vector(x)
>> y <- as.vector(y)
>> all(x %in% y) && all(y %in% x)
>> }
>>
>> It also seems to speed up things a little bit (not in a significant
>> way though).
>>
>> Cheers,
>> H.
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list