[Bioc-devel] R_CheckUserInterrupt guidelines (was Re: R_CheckUserInterrupt in IRanges)

Douglas Bates bates at stat.wisc.edu
Wed Jan 13 20:23:52 CET 2010


On Wed, Jan 13, 2010 at 12:50 PM, Robert Castelo <robert.castelo at upf.edu> wrote:
> thanks for the hint, i thought they were macros indeed. given that the
> two nested loops may take a long time and there are these R_*() calls in
> between, do you think i should PROTECT or PROTECT_WITH_INDEX before
> assigning pointers outside these loops?

If the SEXPs from which you are extracting pointers are arguments in a
.Call from R then they are already protected.  Also, when you are
using a scalar value passed from R then you can assign the value to an
int or a doubler in your C code and you don't need to worry about
garbage collection.  It may be that a pointer from an SEXP can be
reassigned as a result of garbage collection but Robert or Seth (or
Luke) would be better able to answer that question than I.  Most of
the code that I write doesn't call R routines after the initial setup
so I count on the pointers as being fixed.

Another approach, which is used within the Rcpp package, is to copy
the contents of the pointer into dynamically allocated storage on
entry so you are insulated from garbage collection.  Here is how I
would rewrite your code fragment (including replacing the R-style
comments with C comments :-)

double *nrr = REAL(nrrMatrix),
    *ss = REAL(S),
    alph = *REAL(alpha);

int *pair = INTEGER(pairup_ij_int),
    NN = *INTEGER(N),
    ntest = *INTEGER(nTests),
    verb = *INTEGER(verbose),
			
ppct=-1;
k=0;
			
for (i = 0; i < l_int-1; i++) {
 int i2 = pair[i] - 1;

 for (j = i+1; j < l_int; j++) {
   int j2 = pair[j] - 1;

   nrr[i2+j2*n_var] = qp_edge_nrr(ss, n_var, NN, i2, j2, q, ntest, alph);

   nrr[j2+i2*n_var] = nrr[i2+j2*n_var];
   k++;
   pct = (int) ((k * 100) / n_adj);
   if (pct != ppct) {
       if (verb) {		/* show progress when */
       if (pct % 10 == 0)       /* verbose=TRUE */
         Rprintf("%d",pct);
       else
         Rprintf(".",pct);
       R_FlushConsole();
     }
     R_CheckUserInterrupt();
#ifdef Win32
     R_ProcessEvents();
#endif
#ifdef HAVE_AQUA
     R_ProcessEvents();
#endif
     ppct = pct;
   }
 }
}

 >
> thanks,
> robert.
>
>
> On Tue, 2010-01-12 at 09:34 -0600, Douglas Bates wrote:
>> This is a bit off-topic but one idiom that I see in this code is the
>> use of INTEGER() or REAL() inside potentially long loops.  This is not
>> a good idea.  In packages INTEGER(), REAL(), etc. are implemented as
>> functions, not macros.  It is best to assign a pointer outside the
>> loop and work with that, avoiding the function call overhead.  I
>> realize this could be a minuscule change but if what you do with that
>> pointer within the loop is trivial it is best not to add the function
>> call overhead.
>>
>> On Tue, Jan 12, 2010 at 2:26 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
>> > hi, let me take the chance that you're talking about this to check
>> > whether anybody finds anything odd in the following strategy. in the
>> > package i develop (qpgraph - devel version) for the computationally
>> > intensive C-loops i calculate the percentage of progress and each time
>> > this percentage changes in one integer unit i call
>> > R_CheckUserInterrupt() and i also take the chance to process
>> > window-system events with R_ProcessEvents() (if i understand correctly
>> > this is what this function does). here is the code snippet:
>> >
>> > ppct=-1;
>> > k=0;
>> >
>> > for (i = 0; i < l_int-1; i++) {
>> >  int i2 = INTEGER(pairup_ij_int)[i] - 1;
>> >
>> >  for (j = i+1; j < l_int; j++) {
>> >    int j2 = INTEGER(pairup_ij_int)[j] - 1;
>> >
>> >    REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
>> > INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);
>> >
>> >    REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
>> >    k++;
>> >    pct = (int) ((k * 100) / n_adj);
>> >    if (pct != ppct) {
>> >      if (INTEGER(verbose)[0]) { ## show progress when
>> >        if (pct % 10 == 0)       ## verbose=TRUE
>> >          Rprintf("%d",pct);
>> >        else
>> >          Rprintf(".",pct);
>> >        R_FlushConsole();
>> >      }
>> >      R_CheckUserInterrupt();
>> > #ifdef Win32
>> >      R_ProcessEvents();
>> > #endif
>> > #ifdef HAVE_AQUA
>> >      R_ProcessEvents();
>> > #endif
>> >      ppct = pct;
>> >    }
>> >  }
>> > }
>> >
>> > let me clarify that the double nested loop iterates more or less through
>> > every pair of variables and thus this computation grows quadratically in
>> > the number of input variables in the dataset.
>> >
>> > robert.
>> >
>> > On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
>> >> To 2nd what Seth's says, I think it is worth showing an explicit
>> >> example of how to check at regular intervals, e.g.  for (i = 0; i < n;
>> >> i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
>> >>
>> >> /Henrik
>> >>
>> >> On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
>> >> > On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
>> >> >>
>> >> >> I just modified the C or Fortran code section of package guidelines to
>> >> >> reflect this new entry:
>> >> >>
>> >> >> http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
>> >> >
>> >> > This looks reasonable.
>> >> >
>> >> > It is worth noting, however, that calling R_CheckUserInterrupt is not
>> >> > entirely free.  If you have a tight loop it might make more sense to check
>> >> > at intervals rather than each time through and the choice of when to check
>> >> > will depend very much on the computation.
>> >> >
>> >> > Here's an example using the inline package (code is at the bottom) with a
>> >> > silly double loop to compare checking vs not:
>> >> >
>> >> > With R_CheckUserInterrupt inner loop:
>> >> >
>> >> >> system.time(z <- funx(10000L, TRUE))
>> >> >   user  system elapsed
>> >> >  1.310   0.001   1.313
>> >> >
>> >> > No checking:
>> >> >
>> >> > system.time(z <- funx(10000L, FALSE))
>> >> >   user  system elapsed
>> >> >   0.08    0.00    0.08
>> >> >
>> >> > Now this may be completely unrealistic, but I suspect that a sensible take
>> >> > away message is that we should be making sure our C code that loops does
>> >> > call R_CheckUserInterrupt, but depending on the computation, not at every
>> >> > loop iteration.
>> >> >
>> >> > + seth
>> >> >
>> >> >
>> >> > ## example code is below.
>> >> >
>> >> > library("inline")
>> >> >
>> >> > code <- "
>> >> > int i, j, sum = 0, int n = INTEGER(max)[0];
>> >> > int check_inter = LOGICAL(do_check)[0];
>> >> > SEXP ans;
>> >> > PROTECT(ans = allocVector(INTSXP, n));
>> >> > for (i = 0; i < n; i++) {
>> >> >    INTEGER(ans)[i] = i;
>> >> >    for (j = 0; j < n; j++) {
>> >> >        sum = i + j;
>> >> >        if (check_inter) {
>> >> >            R_CheckUserInterrupt();
>> >> >        }
>> >> >    }
>> >> > }
>> >> > UNPROTECT(1);
>> >> > return ans;
>> >> > "
>> >> >
>> >> > funx <- cfunction(signature(max="integer", do_check="logical"), code)
>> >> >
>> >> > system.time(z <- funx(10000L, TRUE))
>> >> > system.time(z <- funx(10000L, FALSE))
>> >> >
>> >> >
>> >> > --
>> >> > Seth Falcon
>> >> > Bioconductor Core Team | FHCRC
>> >> >
>> >> > _______________________________________________
>> >> > Bioc-devel at stat.math.ethz.ch mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >> >
>> >>
>> >> _______________________________________________
>> >> Bioc-devel at stat.math.ethz.ch mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >
>> > _______________________________________________
>> > Bioc-devel at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>>
>
>



More information about the Bioc-devel mailing list