[Rd] table() and as.character() performance for logical values

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Apr 14 11:41:37 CEST 2025


>>>>> Suharto Anggono Suharto Anggono 
>>>>>     on Sat, 12 Apr 2025 08:27:26 +0000 (UTC) writes:

    > For NA case (x == NA_LOGICAL), if R_print.na_width > NB-1 , the "fast path" for 'EncodeLogical' that I propose previously behaves differently from the general case that truncates at (NB-1).

Yes; OTOH, NB = 1000  and as you mention below and show a nice
example,  other parts of the current R sources assume that a
logical never needs more than width 5.

I think we really should check for  R_print.na_width   anyway
and signal an error, typically from the C code called by R's
    print.default(..., na.print = "<na_string>"),
when it is "too large" .. which we'd need to define.
Personally I cannot imagine a reasonable example that would use
an NA print string longer than say 15 (= 2^4 -1 , otherwiese
still somewhat arbitrary).

    > To be consistent with the general case,
    > if(w == R_print.na_width)
    > can be replaced with
    > if(w == R_print.na_width && w <= NB-1)
    > or
    > if(min(w, (NB-1)) == R_print.na_width)

    > Or, just remove the "fast path" for NA case. For example, replace

    >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);}

    > with

    >    if(x == NA_LOGICAL) ;


    > By the way, the comment in 'formatLogical' implies that 5 "is the widest it can be, so stop". It is not true if R_print.na_width > 5 .

    > The output of
    > print(c(FALSE, NA), na.print = "******")
    > is not as it should be.

Indeed (and this has been the case "always" in R); I think this
itself is an (almost unrelated) inconsistecy to be
fixed by preventing too long NA print strings.

Martin

    > ----------------
    > On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler <maechler using stat.math.ethz.ch> wrote: 


>>>>> Suharto Anggono Suharto Anggono via R-devel 
    >>>>>>     on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:

    >     > Chain of calls of C functions in coerce.c for as.character(<logical>) in R:

    >     > do_asatomic
    >     > ascommon
    >     > coerceVector
    >     > coerceToString
    >     > StringFromLogical (for each element)

    >     > The definition of 'StringFromLogical' in coerce.c :

[.....]



More information about the R-devel mailing list