[Rd] Suggestion: 20% speed up of which() with two-character mod
Martin Maechler
maechler at stat.math.ethz.ch
Tue Aug 5 14:54:12 CEST 2008
>>>>> "HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
>>>>> on Mon, 4 Aug 2008 21:14:12 -0700 writes:
HenrikB> Hi,
HenrikB> I just want to do a follow up this very simple
HenrikB> fix/correction/speedup/cleanup of the base::which() function. Here is
HenrikB> a diff:
HenrikB> diff src/library/base/R/which.R which.R
HenrikB> 21c21
HenrikB> < wh <- seq_along(x)[ll <- x & !is.na(x)]
HenrikB> ---
>> wh <- seq_along(x)[x & !is.na(x)]
HenrikB> 25c25
HenrikB> < names(wh) <- names(x)[ll]
HenrikB> ---
>> names(wh) <- names(x)[wh]
HenrikB> FYI, the 'll' variable is not used elsewhere. I've been going through
HenrikB> this modifications several times and I cannot see any side effects.
HenrikB> Could someone of R core please commit this?
I had added your proposition to my version of R-devel in order
to commit it, and had wanted to do my own performance tests
under different scenarios, but I had forgotten / postponed it.
{I have more such things , notably the "help.request() from Kate
Mullen -- with quite a few of my own changes, not quite
finished ... that will have to wait for after useR!2008 ..}
In fact, it seems is pretty obvious that the version with [wh]
instead of [ll] should be faster in most cases, and never
slower,
and so I do commit it now.
Thank you Henrik, for the reminder.
Martin
HenrikB> BTW, when one report diff:s, do you prefer to get it with or without
HenrikB> context information, e.g. -C 3?
{My exact preference would depend on the size / style of the
patch itself. It does not really matter, and as a general rule,
I'd personally prefer '-u' (unified diffs which include context)}
HenrikB> /Henrik
HenrikB> On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
>> On Thu, 10 Jul 2008, Henrik Bengtsson wrote:
>>
>>> Hi,
>>>
>>> by replacing 'll' with 'wh' in the source code for base::which() one
>>> gets ~20% speed up for *named logical vectors*.
>>
>>
>> The amount of speedup depends on how sparse the TRUE values are.
>>
>> When the proportion of TRUEs gets small the speedup is more than twofold on
>> my macbook. For high proportions of TRUE, the speedup is more like the 20%
>> you cite.
>>
>> HTH,
>>
>> Chuck
>>
>>>
>>> CURRENT CODE:
>>>
>>> which <- function(x, arr.ind = FALSE)
>>> {
>>> if(!is.logical(x))
>>> stop("argument to 'which' is not logical")
>>> wh <- seq_along(x)[ll <- x & !is.na(x)]
>>> m <- length(wh)
>>> dl <- dim(x)
>>> if (is.null(dl) || !arr.ind) {
>>> names(wh) <- names(x)[ll]
>>> }
>>> ...
>>> wh;
>>> }
>>>
>>> SUGGESTED CODE: (Remove 'll' and use 'wh')
>>>
>>> which2 <- function(x, arr.ind = FALSE)
>>> {
>>> if(!is.logical(x))
>>> stop("argument to 'which' is not logical")
>>> wh <- seq_along(x)[x & !is.na(x)]
>>> m <- length(wh)
>>> dl <- dim(x)
>>> if (is.null(dl) || !arr.ind) {
>>> names(wh) <- names(x)[wh]
>>> }
>>> ...
>>> wh;
>>> }
>>>
>>> That's all.
>>>
>>> BENCHMARKING:
>>>
>>> # To measure both in same environment
>>> which1 <- base::which;
>>> environment(which1) <- globalenv(); # Needed?
>>>
>>> N <- 1e6;
>>> set.seed(0xbeef);
>>> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
>>> names(x) <- seq_along(x);
>>> B <- 10;
>>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>>> stopifnot(identical(idxs1, idxs2));
>>> print(t1/t2);
>>> # Fair benchmarking
>>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>>> print(t1/t2);
>>> ## user system elapsed
>>> ## 1.283186 1.052632 1.250000
>>>
>>> You get similar results if you put for loop outside the system.time()
>>> call (and sum up the timings).
>>>
>>> Cheers
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> Charles C. Berry (858) 534-2098
>> Dept of Family/Preventive
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
>>
>>
>>
HenrikB> ______________________________________________
HenrikB> R-devel at r-project.org mailing list
HenrikB> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list