[Rd] nchar(x, type = "bytes") seems slower than it could be
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Tue Mar 30 10:20:57 CEST 2021
Thanks for the report, you are probably running into the overhead of the
eager creation of the error message. On my system, with your
micro-benchmark, it is about 10x. I've tested simply by uncommenting it
and re-running the benchmark. I'll fix (this is not a good task for a
contributed patch).
Best,
Tomas
On 3/30/21 8:02 AM, Hugh Parsonage wrote:
> While profiling some C code, I rolled my own nchar function which
> appears to be much faster than base R's (25 times faster for a 10M
> length vector). Obviously base::nchar provides significantly more
> features than my barebones function (C snippet below); however, for
> argument type = "bytes" it seems that the R_nchar and do_nchar
> functions do not actually do anything more than this function.
> My suspicion is that I have overlooked some subtlety in the base R
> code, or that my benchmarks are not representative. Alternatively,
> the action in `do_nchar` of preparing the potential error message
> before being passed to `R_nchar` may be quite costly indeed. Or the
> function cannot be unswitched from the more complex width and chars
> arguments by the compiler.
>
> If I haven't missed something, would a patch be warranted?
>
> SEXP Cnchar(SEXP x) {
> R_xlen_t N = xlength(x);
> SEXP ans = PROTECT(allocVector(INTSXP, N));
> int * restrict ansp = INTEGER(ans);
>
> // Ignoring NA to avoid the branch has a very small
> // impact on performance.
> for (R_xlen_t i = 0; i < N; ++i) {
> SEXP sxi = STRING_ELT(x, i);
> if (sxi == NA_STRING) {
> ansp[i] = NA_INTEGER;
> continue;
> }
> ansp[i] = length(sxi);
> }
> UNPROTECT(1);
> return ans;
> }
>
> x <- rep_len(c(as.character(c(5L, 1:1e6)), NA_character_, 1e6:15e5), 1e7)
> Cnchar(x)
> 90ms
> nchar(x, type = "bytes")
> 2500 ms
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list