[Rd] Suggestion: 20% speed up of which() with two-character mod
Henrik Bengtsson
hb at stat.berkeley.edu
Tue Aug 5 06:14:12 CEST 2008
Hi,
I just want to do a follow up this very simple
fix/correction/speedup/cleanup of the base::which() function. Here is
a diff:
diff src/library/base/R/which.R which.R
21c21
< wh <- seq_along(x)[ll <- x & !is.na(x)]
---
> wh <- seq_along(x)[x & !is.na(x)]
25c25
< names(wh) <- names(x)[ll]
---
> names(wh) <- names(x)[wh]
FYI, the 'll' variable is not used elsewhere. I've been going through
this modifications several times and I cannot see any side effects.
Could someone of R core please commit this?
BTW, when one report diff:s, do you prefer to get it with or without
context information, e.g. -C 3?
/Henrik
On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
> On Thu, 10 Jul 2008, Henrik Bengtsson wrote:
>
>> Hi,
>>
>> by replacing 'll' with 'wh' in the source code for base::which() one
>> gets ~20% speed up for *named logical vectors*.
>
>
> The amount of speedup depends on how sparse the TRUE values are.
>
> When the proportion of TRUEs gets small the speedup is more than twofold on
> my macbook. For high proportions of TRUE, the speedup is more like the 20%
> you cite.
>
> HTH,
>
> Chuck
>
>>
>> CURRENT CODE:
>>
>> which <- function(x, arr.ind = FALSE)
>> {
>> if(!is.logical(x))
>> stop("argument to 'which' is not logical")
>> wh <- seq_along(x)[ll <- x & !is.na(x)]
>> m <- length(wh)
>> dl <- dim(x)
>> if (is.null(dl) || !arr.ind) {
>> names(wh) <- names(x)[ll]
>> }
>> ...
>> wh;
>> }
>>
>> SUGGESTED CODE: (Remove 'll' and use 'wh')
>>
>> which2 <- function(x, arr.ind = FALSE)
>> {
>> if(!is.logical(x))
>> stop("argument to 'which' is not logical")
>> wh <- seq_along(x)[x & !is.na(x)]
>> m <- length(wh)
>> dl <- dim(x)
>> if (is.null(dl) || !arr.ind) {
>> names(wh) <- names(x)[wh]
>> }
>> ...
>> wh;
>> }
>>
>> That's all.
>>
>> BENCHMARKING:
>>
>> # To measure both in same environment
>> which1 <- base::which;
>> environment(which1) <- globalenv(); # Needed?
>>
>> N <- 1e6;
>> set.seed(0xbeef);
>> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
>> names(x) <- seq_along(x);
>> B <- 10;
>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>> stopifnot(identical(idxs1, idxs2));
>> print(t1/t2);
>> # Fair benchmarking
>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
>> print(t1/t2);
>> ## user system elapsed
>> ## 1.283186 1.052632 1.250000
>>
>> You get similar results if you put for loop outside the system.time()
>> call (and sum up the timings).
>>
>> Cheers
>>
>> Henrik
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
>
>
>
More information about the R-devel
mailing list