[R-pkg-devel] Cannot create C code with acceptable performance with respect to internal R command.
Avraham Adler
@vr@h@m@@d|er @end|ng |rom gm@||@com
Fri Dec 6 08:58:10 CET 2024
Sent from my iPhone
> On Dec 5, 2024, at 4:11 PM, Sokol Serguei <serguei.sokol using gmail.com> wrote:
>
> Luc,
>
> There can be many reasons explaining the difference in compiled code performances. Tuning such code to achieve a pick performance is generally a fine art.
> Optimizations techniques can include but are not limited to:
> - SIMD instructions (and memory alignment for their optimal use);
> - instruction level parallelism;
> - unrolling loops;
> - cache level (mis-)hits;
> - multi-thread parallelism;
> - ...
> Approaches in optimization are not the same depending on kind of application: CPU-bound, memory-bound or IO-bound.
> Many of this techniques can be directly used (or not) by compiler depending on chosen options. Are you sure to use the same options and compiler that were used during R compilation?
> And finally, the compared code could be plainly not the same. R can use BLAS call, e.g. OpenBLAS to multiply two matrices. This latter is heavily optimized for such operations and can achieve x10 acceleration compared to plain "naive" BLAS.
> The R code you cite can be just the code for a fallback in case no BLAS was found during R compilation.
> Look at what your sessionInfo() says about used BLAS.
That doesn’t always work. I build R on Windows (10) linking to a pre-compiled static OpenBLAS (3.28) and my sessionInfo has an empty string for BLAS. I reckon that is because I’m using Rblas.dll, it’s just that my Rblas isn’t vanilla.
Avi
>
> Best,
> Serguei.
>
>> Le 05/12/2024 à 14:21, Luc De Wilde a écrit :
>> Dear package developers,
>>
>> in creating a package lavaanC for use in lavaan, I need to perform some matrix computations involving matrix products and crossproducts. As far as I see I cannot directly call the C code in the R core. So I copied the code in the R core, but the same C/C++ code in a package is 2.5 à 3 times slower than executed directly in R :
>>
>> C code in package :
>> SEXP prod0(SEXP mat1, SEXP mat2) {
>> SEXP u1 = Rf_getAttrib(mat1, R_DimSymbol);
>> int m1 = INTEGER(u1)[0];
>> int n1 = INTEGER(u1)[1];
>> SEXP u2 = Rf_getAttrib(mat2, R_DimSymbol);
>> int m2 = INTEGER(u2)[0];
>> int n2 = INTEGER(u2)[1];
>> if (n1 != m2) Rf_error("matrices not conforming");
>> SEXP retval = PROTECT(Rf_allocMatrix(REALSXP, m1, n2));
>> double* left = REAL(mat1);
>> double* right = REAL(mat2);
>> double* ret = REAL(retval);
>> double werk = 0.0;
>> for (int j = 0; j < n2; j++) {
>> for (int i = 0; i < m1; i++) {
>> werk = 0.0;
>> for (int k = 0; k < n1; k++) werk += (left[i + m1 * k] * right[k + m2 * j]);
>> ret[j * m1 + i] = werk;
>> }
>> }
>> UNPROTECT(1);
>> return retval;
>> }
>>
>> Test script :
>> m1 <- matrix(rnorm(300000), nrow = 60)
>> m2 <- matrix(rnorm(300000), ncol = 60)
>> print(microbenchmark::microbenchmark(
>> m1 %*% m2, .Call("prod0", m1, m2), times = 100
>> ))
>>
>> Result on my pc:
>> Unit: milliseconds
>> expr min lq mean median uq max neval
>> m1 %*% m2 10.5650 10.8967 11.13434 10.9449 11.02965 15.8397 100
>> .Call("prod0", m1, m2) 29.3336 30.7868 32.05114 31.0408 33.85935 45.5321 100
>>
>>
>> Can anyone explain why the compiled code in the package is so much slower than in R core?
>>
>> and
>>
>> Is there a way to improve the performance in R package?
>>
>>
>> Best regards,
>>
>> Luc De Wilde
>>
>>
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list