[Rd] table() and as.character() performance for logical values

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Apr 10 17:53:58 CEST 2025


>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:

    > Chain of calls of C functions in coerce.c for as.character(<logical>) in R:

    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)

    > The definition of 'StringFromLogical' in coerce.c :

    > Chain of calls of C functions in coerce.c for as.character(<logical>) in R:
    > 
    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)
    > 
    > The definition of 'StringFromLogical' in coerce.c :
    > 
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >     int w;
    >     formatLogical(&x, 1, &w);
    >     if (x == NA_LOGICAL) return NA_STRING;
    >     else return mkChar(EncodeLogical(x, w));
    > }
    > 
    > The definition of 'EncodeLogical' in printutils.c :
    > 
    > const char *EncodeLogical(int x, int w)
    > {
    >     static char buff[NB];
    >     if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string));
    >     else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
    >     else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
    >     buff[NB-1] = '\0';
    >     return buff;
    > }
    > 
    > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    2.69    0.02    2.73
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.15    0.04    0.20
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.08    0.05    0.13
    > > L <- rep(NA, 10^7)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    0.11    0.00    0.11
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.16    0.06    0.22
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.09    0.03    0.12
    > 
    > `as.character` of a logical vector that is all NA is fast enough. 
    > It appears that the call to 'formatLogical' inside > the C function
    > 'StringFromLogical' does not introduce much     > slowdown. 


    > I found that using string literal inside the C function 'StringFromLogical', by replacing
    > EncodeLogical(x, w)
    > with
    > x ? "TRUE" : "FALSE"
    > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

    > Alternatively, 
or in addition !

    > "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R. 
    > For example, without replacing existing code, the following fragment could be inserted.
    > 
    >     if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);}
    >     else if(x) {if(w == 4) return "TRUE";}
    >     else {if(w == 5) return "FALSE";}
    > 
    > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) .
    > 
    > Precomputing or caching possible results of the C function 'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", "TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to
    > 
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >     static SEXP TrueCh, FalseCh;
    >     if (x == NA_LOGICAL) return NA_STRING;
    >     else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
    >     else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));
    > }

Indeed, and something along this line (storing the other two constant strings) was also 
my thought when seeing the
   mkChar(x ? "TRUE" : "FALSE)
you implicitly proposed above.

I'm looking into applying both speedups;
thank you very much, Suharto!

Martin


--
Martin Maechler
ETH Zurich  and  R Core team



More information about the R-devel mailing list