[R-pkg-devel] Cannot create C code with acceptable performance with respect to internal R command.

Thu Dec 5 14:55:51 CET 2024

Luc,

There can be many reasons explaining the difference in compiled code 
performances. Tuning such code to achieve a pick performance is 
generally a fine art.
Optimizations techniques can include but are not limited to:
  - SIMD instructions (and memory alignment for their optimal use);
  - instruction level parallelism;
  - unrolling loops;
  - cache level (mis-)hits;
  - multi-thread parallelism;
  - ...
Approaches in optimization are not the same depending on kind of 
application: CPU-bound, memory-bound or IO-bound.
Many of this techniques can be directly used (or not) by compiler 
depending on chosen options. Are you sure to use the same options and 
compiler that were used during R compilation?
And finally, the compared code could be plainly not the same. R can use 
BLAS call, e.g. OpenBLAS to multiply two matrices. This latter is 
heavily optimized for such operations and can achieve x10 acceleration 
compared to plain "naive" BLAS.
The R code you cite can be just the code for a fallback in case no BLAS 
was found during R compilation.
Look at what your sessionInfo() says about used BLAS.

Best,
Serguei.

Le 05/12/2024 à 14:21, Luc De Wilde a écrit :
> Dear package developers,
>
> in creating a package lavaanC for use in lavaan, I need to perform some matrix computations involving matrix products and crossproducts. As far as I see I cannot directly call the C code in the R core. So I copied the code in the R core, but the same C/C++ code in a package is 2.5 à 3 times slower than executed directly in R :
>
> C code in package :
>    SEXP prod0(SEXP mat1, SEXP mat2) {
>      SEXP u1 = Rf_getAttrib(mat1, R_DimSymbol);
>      int m1 = INTEGER(u1)[0];
>      int n1 = INTEGER(u1)[1];
>      SEXP u2 = Rf_getAttrib(mat2, R_DimSymbol);
>      int m2 = INTEGER(u2)[0];
>      int n2 = INTEGER(u2)[1];
>      if (n1 != m2) Rf_error("matrices not conforming");
>      SEXP retval = PROTECT(Rf_allocMatrix(REALSXP, m1, n2));
>      double* left = REAL(mat1);
>      double* right = REAL(mat2);
>      double* ret = REAL(retval);
>      double werk = 0.0;
>      for (int j = 0; j < n2; j++) {
>        for (int i = 0; i < m1; i++) {
>            werk = 0.0;
>          for (int k = 0; k < n1; k++) werk += (left[i + m1 * k] * right[k + m2 * j]);
>          ret[j * m1 + i] =  werk;
>        }
>      }
>      UNPROTECT(1);
>      return retval;
>    }
>
> Test script :
> m1 <- matrix(rnorm(300000), nrow = 60)
> m2 <- matrix(rnorm(300000), ncol = 60)
> print(microbenchmark::microbenchmark(
>    m1 %*% m2, .Call("prod0", m1, m2), times = 100
> ))
>
> Result on my pc:
> Unit: milliseconds
>                     expr     min      lq     mean  median       uq     max neval
>                m1 %*% m2 10.5650 10.8967 11.13434 10.9449 11.02965 15.8397   100
>   .Call("prod0", m1, m2) 29.3336 30.7868 32.05114 31.0408 33.85935 45.5321   100
>
>
> Can anyone explain why the compiled code in the package is so much slower than in R core?
>
> and
>
> Is there a way to improve the performance in R package?
>
>
> Best regards,
>
> Luc De Wilde
>
>
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel