[Rd] R_check_class_etc(x, valid) is "slow" when 'valid' contains class(x)

Mikael Jagan j@g@nmn2 @end|ng |rom gm@||@com
Thu Sep 8 18:44:30 CEST 2022



On 2022-09-08 9:23 am, Tomas Kalibera wrote:
> 
> On 02/09/2022 14:51, Mikael Jagan wrote:
>>
>>
>> On 2022-09-02 5:30 am, Tomas Kalibera wrote:
>>> Hi Mikael,
>>>
>>> On 8/28/22 01:13, Mikael Jagan wrote:
>>>> R_check_class_etc(x, valid) spends a nontrivial amount of time finding
>>>> an environment 'rho' containing the definition of class(x), evaluating
>>>> (in R, not C) methods::.classEnv(class(x)).
>>>>
>>>> It then returns the result of R_check_class_and_super(x, valid, rho).
>>>> But R_check_class_and_super() does not use 'rho' at all in the trivial
>>>> case where class(x) is found in 'valid'.
>>> right, that could be improved. Do you have an example which exhibits the 
>>> problem, have you found this by profiling something?
>> I benchmarked .Call("R_etc", x) for x <- new("dgTMatrix"), after compiling
>> and loading the following:
>>
>> ```
>> #include <Rinternals.h>
>>
>> SEXP R_etc(SEXP x) {
>>     static const char *valid[] = {"dgCMatrix", "dgRMatrix", "dgTMatrix", ""};
>>     return ScalarInteger(R_check_class_etc(x, valid));
>> }
>> ```
>>
>> R-devel used ~2 microseconds while my patched version of R used ~0.2
>> microseconds.  So, not a bottleneck by any means.  On the other hand,
>> 'Matrix' and probably other packages do call R_check_class_etc() quite
>> often, hence my report.
> I see, ideally one would have a real scenario with a real performance issue 
> leading to this. With a micro-benchmark, one could come up with pretty much 
> arbitrary speedup, so it is harder to judge whether an optimization is actually 
> worth the effort, the risk of introducing bugs and sometimes the added complexity.

Yes.  I _was_ at first thinking about a less artificial example
(and in retrospect I probably should have led with that):

## install.packages("Matrix", repos = "http://R-Forge.R-project.org")
library(Matrix)
n <- 10L
x <- new("dgeMatrix", Dim = c(n, n), x = as.double(seq_len(n^2)))
microbenchmark::microbenchmark(t(x))

where the 't' method calls the C-level unpackedMatrix_transpose(),
which in turn calls R_check_class_etc().

>>>>
>>>> My feeling is that this can be improved.  I am happy to contribute a patch,
>>>> if it would be considered by R-core.
>>>
>>> Both R_check_class_etc and R_check_class_and_super are unfortunately 
>>> exported, the former is used a lot in packages (even though they are not 
>>> mentioned in Writing R Extensions, so actually shouldn't be used in 
>>> packages). Anyway, it would be easier if we could preserve their interface 
>>> and behavior.
>>>
>>> Maybe we could support rho==NULL in R_check_class_and_super, the environment 
>>> would be looked up in that case when needed. R_check_class_etc would simply 
>>> only call R_check_class_and_super with that argument. I see that 
>>> R_check_class_and_super uses asChar() on the class attribute, while 
>>> R_check_class_etc does not currently for looking for the environment, but I 
>>> assume doing that in both cases should not matter (and it would have to be 
>>> tested).
>>>
>>> So this would be a trivial change, but if you wanted to create a minimal 
>>> patch, I will be happy to have a look.
>>
>> I've implemented essentially what you've described; see attached.
> 
> Thanks, I've restored the original behavior with pkg == R_NilValue (note rho was 
> R_GlobalEnv), I've used C NULL to mark the special mode when "rho" should be 
> found, and I made some small tweaks. I will add after more testing, if the 
> results are ok.

Makes sense - thanks for catching that.

Mikael

> 
> Best
> Tomas
> 
>>
>> Mikael
>>
>>>
>>> Best
>>> Tomas
>>>
>>>
>>>> Mikael
>>>>
>>>> ______________________________________________
>>>> R-devel using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list