[Rd] R_CheckUserInterrupt() can be a performance bottleneck within GUIs
Sokol Serguei
@oko| @end|ng |rom |n@@-tou|ou@e@|r
Wed Dec 18 15:55:53 CET 2024
Le 18/12/2024 à 13:16, Tomas Kalibera a écrit :
>
> On 12/18/24 01:19, Simon Urbanek wrote:
>> It seems benign, but has implications since checking time is actually
>> not a cheap operation: adding jus ta time check alone incurs a penalty
>> of ca. 700% compared with the time it takes to call
>> R_CheckUserInterrupt(). Generally, it makes no sense to check
>> interrupts at every iteration - you'll find code like if (++i % 10000
>> == 0) R_CheckUserInterrupt(); in loops to make sure it's not called
>> unnecessarily.
>
> Yes, and worse yet, even the modulo operation has too high overhead in
> some loops (unless it is a power of two). It is faster to decrement and
> compare against zero.
To give an idea about how many ms can be spared by the shift from "%" to "++" operator, here is
comparison:
---
library(Rcpp)
cppFunction('
double nonsense0(const int n, const int m) {
int i, j;
double result;
for (i=0;i<n;i++) {
result = 1.;
for (j=1;j<=m;j++) if (j%2) result *= j; else result /=j;
}
return(result);
}')
cppFunction('
double nonsense_div(const int n, const int m, const int check) {
int ichk=0;
int i, j;
double result;
for (i=0;i<n;i++) {
if (++ichk % check == 0) {R_CheckUserInterrupt();};
result = 1.;
for (j=1;j<=m;j++) if (j%2) result *= j; else result /=j;
}
return(result);
}')
cppFunction('
double nonsense_add(const int n, const int m, const int check) {
int ichk=0;
int i, j;
double result;
for (i=0;i<n;i++) {
if (++ichk == check) {ichk = 0; R_CheckUserInterrupt();};
result = 1.;
for (j=1;j<=m;j++) if (j%2) result *= j; else result /=j;
}
return(result);
}')
print(microbenchmark::microbenchmark(nonsense0(1e8, 10), nonsense_div(1e8,10,1000),
nonsense_add(1e8,10,1000), times=10))
---
In R terminal, it gives:
Unit: milliseconds
expr min lq mean median uq max neval
nonsense0(1e+08, 10) 637.9675 642.0459 642.9286 644.1687 644.7978 645.0136 10
nonsense_div(1e+08, 10, 1000) 759.6458 769.1841 769.1112 769.9288 773.3095 773.7830 10
nonsense_add(1e+08, 10, 1000) 695.0050 702.3962 701.7383 702.4955 702.7779 702.9693 10
while in RStudio:
Unit: milliseconds
expr min lq mean median uq max neval
nonsense0(1e+08, 10) 642.6170 644.7436 644.7964 645.0867 645.4031 645.7150 10
nonsense_div(1e+08, 10, 1000) 786.3055 786.9672 790.0751 787.5447 788.3046 811.8566 10
nonsense_add(1e+08, 10, 1000) 722.7741 723.4125 728.6252 723.9101 732.9117 749.1472 10
i.e. the saving is around 10% or roughly 65 ms.
Best,
Serguei.
>
> It is the responsibility of the gui or application processing events
> that it doesn't do expensive operations on every call - that depends on
> what that application is doing and how expensive the processing is.
>
> The frequency of calls to R_CheckUserInterrupt() should be tuned using
> base R without gui - if it were too high there, indeed the
> guis/applications couldn't do anything on their end. If anyone finds a
> loop in base R running standalone where the overhead is too high, the
> frequency can be adjusted - a bug report with reproducible example in
> that case would help.
>
> Best
> Tomas
>
>> Cheers,
>> Simon
>>
>>
>>> On Dec 18, 2024, at 4:04 AM, Ben Bolker <bbolker using gmail.com> wrote:
>>>
>>> This seems like a great idea. Would it help to escalate this to a
>>> post on R-bugzilla, so it is less likely to fall through the cracks?
>>>
>>> On 12/17/24 09:51, Jeroen Ooms wrote:
>>>> A more generic solution would be for R to throttle calls to
>>>> R_CheckUserInterrupt(), because it makes no sense to check 1000 times
>>>> per second if a user has interrupted, but it is difficult for the
>>>> caller to know when R_CheckUserInterrupt() has been last called, or do
>>>> it regularly without over-doing it.
>>>> Here is a simple patch: https://github.com/r-devel/r-svn/pull/125
>>>> See also: https://stat.ethz.ch/pipermail/r-devel/2023-May/082597.html
>>>> On Tue, Dec 17, 2024 at 10:47 AM Martin Becker
>>>> <martin.becker using mx.uni-saarland.de> wrote:
>>>>> tl;dr: R_CheckUserInterrupt() can be a performance bottleneck
>>>>> within GUIs. This also affects functions in the 'stats'
>>>>> package, which could be improved by changing the position
>>>>> of calls to R_CheckUserInterrupt().
>>>>>
>>>>>
>>>>> Dear all,
>>>>>
>>>>> Recently I was puzzled because some code in a package under
>>>>> development,
>>>>> which consisted almost entirely of a .Call() to a function written
>>>>> in C,
>>>>> was running much slower within RStudio compared to R in a terminal. It
>>>>> took me some time to identify the cause, so I thought I would share my
>>>>> findings; perhaps they will be helpful to others.
>>>>>
>>>>> The performance drop was caused by R_CheckUserInterrupt(), which I
>>>>> call
>>>>> (perhaps too often) in my C code. While calling R_CheckUserInterrupt()
>>>>> seems to be quite cheap when running R or Rscript in a terminal, it is
>>>>> more expensive when running R within a GUI, especially within RStudio,
>>>>> as I noticed (but also, e.g., within R.app on MacOS). In fact, using a
>>>>> GUI (especially RStudio) can change the cost of (frequent) calls to
>>>>> R_CheckUserInterrupt() from negligible to critical (in real-world
>>>>> applications). Significant performance drops are also visible for
>>>>> functions in the 'stats' package, e.g., pwilcox().
>>>>>
>>>>> The following MWE (using Rcpp) illustrates the problem. Consider the
>>>>> following code:
>>>>>
>>>>> ---
>>>>>
>>>>> library(Rcpp)
>>>>> cppFunction('double nonsense(const int n, const int m, const int
>>>>> check) {
>>>>> int i, j;
>>>>> double result;
>>>>> for (i=0;i<n;i++) {
>>>>> if (check) R_CheckUserInterrupt();
>>>>> result = 1.;
>>>>> for (j=1;j<=m;j++) if (j%2) result *= j; else result /=j;
>>>>> }
>>>>> return(result);
>>>>> }')
>>>>>
>>>>> tmp1 <- system.time(nonsense(1e8,10,0))[1]
>>>>> tmp2 <- system.time(nonsense(1e8,10,1))[1]
>>>>> cat("w/o check:",tmp1,"sec., with check:",tmp2,"sec.,
>>>>> diff.:",tmp2-tmp1,"sec.\n")
>>>>>
>>>>> tmp3 <- system.time(pwilcox(rwilcox(1e5,40,60),40,60))[1]
>>>>> cat("wilcox example:",tmp3,"sec.\n")
>>>>>
>>>>> ---
>>>>>
>>>>> Running this code when R (4.4.2) is started in a terminal window
>>>>> produces the following measurements/output (Apple M1, MacOS 15.1.1):
>>>>>
>>>>> w/o check: 0.525 sec., with check: 0.752 sec., diff.: 0.227 sec.
>>>>> wilcox example: 1.028 sec.
>>>>>
>>>>> Running the same code when R is used within R.app (1.81 (8462)
>>>>> aarch64-apple-darwin20) on the same machine results in:
>>>>>
>>>>> w/o check: 0.525 sec., with check: 1.683 sec., diff.: 1.158 sec.
>>>>> wilcox example: 2.13 sec.
>>>>>
>>>>> Running the same code when R is used within RStudio Desktop (2024.12.0
>>>>> Build 467) on the same machine results in:
>>>>>
>>>>> w/o check: 0.507 sec., with check: 22.905 sec., diff.: 22.398 sec.
>>>>> wilcox example: 29.686 sec.
>>>>>
>>>>> So, the performance drop is already remarkable for R.app, but really
>>>>> huge for RStudio.
>>>>>
>>>>> Presumably, checking for user interrupts within a GUI is more involved
>>>>> than within a terminal window, so there may not be much room for
>>>>> improvement in R.app or RStudio (and I know that this list is not the
>>>>> right place to suggest improvements for RStudio or to report unwanted
>>>>> behaviour). However, it might be worth considering
>>>>>
>>>>> 1. an addition to the documentation in WRE (explaining that too many
>>>>> calls to R_CheckUserInterrupt() can cause a performance bottleneck,
>>>>> especially when the code is running within a GUI),
>>>>> 2. check (and possibly change) the position of
>>>>> R_CheckUserInterrupt() in
>>>>> some base R functions. For example, moving R_CheckUserInterrupt() from
>>>>> cwilcox() to pwilcox() and qwilcox() in src/nmath/wilcox.c may lead
>>>>> to a
>>>>> significant improvement (while still being feasible in terms of
>>>>> response
>>>>> time).
>>>>>
>>>>> Best,
>>>>> Martin
>>>>>
>>>>>
>>>>> --
>>>>> apl. Prof. Dr. Martin Becker, Akad. Oberrat
>>>>> Lehrstab Statistik
>>>>> Quantitative Methoden
>>>>> Fakultät für Empirische Humanwissenschaften und
>>>>> Wirtschaftswissenschaft
>>>>> Universität des Saarlandes
>>>>> Campus C3 1, Raum 2.17
>>>>> 66123 Saarbrücken
>>>>> Deutschland
>>>>>
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> ______________________________________________
>>>> R-devel using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> --
>>> Dr. Benjamin Bolker
>>> Professor, Mathematics & Statistics and Biology, McMaster University
>>> Director, School of Computational Science and Engineering
>>> * E-mail is sent at my convenience; I don't expect replies outside of
>>> working hours.
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list