[R-pkg-devel] Cannot create C code with acceptable performance with respect to internal R command.
Sokol Serguei
@ergue|@@oko| @end|ng |rom gm@||@com
Thu Dec 5 14:55:51 CET 2024
Luc,
There can be many reasons explaining the difference in compiled code
performances. Tuning such code to achieve a pick performance is
generally a fine art.
Optimizations techniques can include but are not limited to:
- SIMD instructions (and memory alignment for their optimal use);
- instruction level parallelism;
- unrolling loops;
- cache level (mis-)hits;
- multi-thread parallelism;
- ...
Approaches in optimization are not the same depending on kind of
application: CPU-bound, memory-bound or IO-bound.
Many of this techniques can be directly used (or not) by compiler
depending on chosen options. Are you sure to use the same options and
compiler that were used during R compilation?
And finally, the compared code could be plainly not the same. R can use
BLAS call, e.g. OpenBLAS to multiply two matrices. This latter is
heavily optimized for such operations and can achieve x10 acceleration
compared to plain "naive" BLAS.
The R code you cite can be just the code for a fallback in case no BLAS
was found during R compilation.
Look at what your sessionInfo() says about used BLAS.
Best,
Serguei.
Le 05/12/2024 à 14:21, Luc De Wilde a écrit :
> Dear package developers,
>
> in creating a package lavaanC for use in lavaan, I need to perform some matrix computations involving matrix products and crossproducts. As far as I see I cannot directly call the C code in the R core. So I copied the code in the R core, but the same C/C++ code in a package is 2.5 à 3 times slower than executed directly in R :
>
> C code in package :
> SEXP prod0(SEXP mat1, SEXP mat2) {
> SEXP u1 = Rf_getAttrib(mat1, R_DimSymbol);
> int m1 = INTEGER(u1)[0];
> int n1 = INTEGER(u1)[1];
> SEXP u2 = Rf_getAttrib(mat2, R_DimSymbol);
> int m2 = INTEGER(u2)[0];
> int n2 = INTEGER(u2)[1];
> if (n1 != m2) Rf_error("matrices not conforming");
> SEXP retval = PROTECT(Rf_allocMatrix(REALSXP, m1, n2));
> double* left = REAL(mat1);
> double* right = REAL(mat2);
> double* ret = REAL(retval);
> double werk = 0.0;
> for (int j = 0; j < n2; j++) {
> for (int i = 0; i < m1; i++) {
> werk = 0.0;
> for (int k = 0; k < n1; k++) werk += (left[i + m1 * k] * right[k + m2 * j]);
> ret[j * m1 + i] = werk;
> }
> }
> UNPROTECT(1);
> return retval;
> }
>
> Test script :
> m1 <- matrix(rnorm(300000), nrow = 60)
> m2 <- matrix(rnorm(300000), ncol = 60)
> print(microbenchmark::microbenchmark(
> m1 %*% m2, .Call("prod0", m1, m2), times = 100
> ))
>
> Result on my pc:
> Unit: milliseconds
> expr min lq mean median uq max neval
> m1 %*% m2 10.5650 10.8967 11.13434 10.9449 11.02965 15.8397 100
> .Call("prod0", m1, m2) 29.3336 30.7868 32.05114 31.0408 33.85935 45.5321 100
>
>
> Can anyone explain why the compiled code in the package is so much slower than in R core?
>
> and
>
> Is there a way to improve the performance in R package?
>
>
> Best regards,
>
> Luc De Wilde
>
>
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list