[Rd] table() and as.character() performance for logical values
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Apr 10 17:53:58 CEST 2025
>>>>> Suharto Anggono Suharto Anggono via R-devel
>>>>> on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:
> Chain of calls of C functions in coerce.c for as.character(<logical>) in R:
> do_asatomic
> ascommon
> coerceVector
> coerceToString
> StringFromLogical (for each element)
> The definition of 'StringFromLogical' in coerce.c :
> Chain of calls of C functions in coerce.c for as.character(<logical>) in R:
>
> do_asatomic
> ascommon
> coerceVector
> coerceToString
> StringFromLogical (for each element)
>
> The definition of 'StringFromLogical' in coerce.c :
>
> attribute_hidden SEXP StringFromLogical(int x, int *warn)
> {
> int w;
> formatLogical(&x, 1, &w);
> if (x == NA_LOGICAL) return NA_STRING;
> else return mkChar(EncodeLogical(x, w));
> }
>
> The definition of 'EncodeLogical' in printutils.c :
>
> const char *EncodeLogical(int x, int w)
> {
> static char buff[NB];
> if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string));
> else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
> else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
> buff[NB-1] = '\0';
> return buff;
> }
>
> > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
> > system.time(as.character(L))
> user system elapsed
> 2.69 0.02 2.73
> > system.time(c("FALSE", "TRUE")[L+1])
> user system elapsed
> 0.15 0.04 0.20
> > system.time(c("FALSE", "TRUE")[L+1L])
> user system elapsed
> 0.08 0.05 0.13
> > L <- rep(NA, 10^7)
> > system.time(as.character(L))
> user system elapsed
> 0.11 0.00 0.11
> > system.time(c("FALSE", "TRUE")[L+1])
> user system elapsed
> 0.16 0.06 0.22
> > system.time(c("FALSE", "TRUE")[L+1L])
> user system elapsed
> 0.09 0.03 0.12
>
> `as.character` of a logical vector that is all NA is fast enough.
> It appears that the call to 'formatLogical' inside > the C function
> 'StringFromLogical' does not introduce much > slowdown.
> I found that using string literal inside the C function 'StringFromLogical', by replacing
> EncodeLogical(x, w)
> with
> x ? "TRUE" : "FALSE"
> (and the call to 'formatLogical' is not needed anymore), make it faster.
indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.
> Alternatively,
or in addition !
> "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R.
> For example, without replacing existing code, the following fragment could be inserted.
>
> if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);}
> else if(x) {if(w == 4) return "TRUE";}
> else {if(w == 5) return "FALSE";}
>
> However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) .
>
> Precomputing or caching possible results of the C function 'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", "TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to
>
> attribute_hidden SEXP StringFromLogical(int x, int *warn)
> {
> static SEXP TrueCh, FalseCh;
> if (x == NA_LOGICAL) return NA_STRING;
> else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
> else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));
> }
Indeed, and something along this line (storing the other two constant strings) was also
my thought when seeing the
mkChar(x ? "TRUE" : "FALSE)
you implicitly proposed above.
I'm looking into applying both speedups;
thank you very much, Suharto!
Martin
--
Martin Maechler
ETH Zurich and R Core team
More information about the R-devel
mailing list