[Bioc-devel] R_CheckUserInterrupt guidelines (was Re: R_CheckUserInterrupt in IRanges)

Robert Castelo robert.castelo at upf.edu
Wed Jan 13 19:50:36 CET 2010


thanks for the hint, i thought they were macros indeed. given that the
two nested loops may take a long time and there are these R_*() calls in
between, do you think i should PROTECT or PROTECT_WITH_INDEX before
assigning pointers outside these loops?

thanks,
robert.


On Tue, 2010-01-12 at 09:34 -0600, Douglas Bates wrote:
> This is a bit off-topic but one idiom that I see in this code is the
> use of INTEGER() or REAL() inside potentially long loops.  This is not
> a good idea.  In packages INTEGER(), REAL(), etc. are implemented as
> functions, not macros.  It is best to assign a pointer outside the
> loop and work with that, avoiding the function call overhead.  I
> realize this could be a minuscule change but if what you do with that
> pointer within the loop is trivial it is best not to add the function
> call overhead.
> 
> On Tue, Jan 12, 2010 at 2:26 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
> > hi, let me take the chance that you're talking about this to check
> > whether anybody finds anything odd in the following strategy. in the
> > package i develop (qpgraph - devel version) for the computationally
> > intensive C-loops i calculate the percentage of progress and each time
> > this percentage changes in one integer unit i call
> > R_CheckUserInterrupt() and i also take the chance to process
> > window-system events with R_ProcessEvents() (if i understand correctly
> > this is what this function does). here is the code snippet:
> >
> > ppct=-1;
> > k=0;
> >
> > for (i = 0; i < l_int-1; i++) {
> >  int i2 = INTEGER(pairup_ij_int)[i] - 1;
> >
> >  for (j = i+1; j < l_int; j++) {
> >    int j2 = INTEGER(pairup_ij_int)[j] - 1;
> >
> >    REAL(nrrMatrix)[i2+j2*n_var] = qp_edge_nrr(REAL(S),n_var,
> > INTEGER(N)[0], i2, j2, q, INTEGER(nTests)[0],REAL(alpha)[0]);
> >
> >    REAL(nrrMatrix)[j2+i2*n_var] = REAL(nrrMatrix)[i2+j2*n_var];
> >    k++;
> >    pct = (int) ((k * 100) / n_adj);
> >    if (pct != ppct) {
> >      if (INTEGER(verbose)[0]) { ## show progress when
> >        if (pct % 10 == 0)       ## verbose=TRUE
> >          Rprintf("%d",pct);
> >        else
> >          Rprintf(".",pct);
> >        R_FlushConsole();
> >      }
> >      R_CheckUserInterrupt();
> > #ifdef Win32
> >      R_ProcessEvents();
> > #endif
> > #ifdef HAVE_AQUA
> >      R_ProcessEvents();
> > #endif
> >      ppct = pct;
> >    }
> >  }
> > }
> >
> > let me clarify that the double nested loop iterates more or less through
> > every pair of variables and thus this computation grows quadratically in
> > the number of input variables in the dataset.
> >
> > robert.
> >
> > On Mon, 2010-01-11 at 20:07 -0800, Henrik Bengtsson wrote:
> >> To 2nd what Seth's says, I think it is worth showing an explicit
> >> example of how to check at regular intervals, e.g.  for (i = 0; i < n;
> >> i++) { if (i %% 100 == 0) R_CheckUserInterrupt(); ... }
> >>
> >> /Henrik
> >>
> >> On Mon, Jan 11, 2010 at 3:50 PM, Seth Falcon <sfalcon at fhcrc.org> wrote:
> >> > On 1/11/10 1:31 PM, Patrick Aboyoun wrote:
> >> >>
> >> >> I just modified the C or Fortran code section of package guidelines to
> >> >> reflect this new entry:
> >> >>
> >> >> http://wiki.fhcrc.org/bioc/Package_Guidelines#c-or-fortran-code
> >> >
> >> > This looks reasonable.
> >> >
> >> > It is worth noting, however, that calling R_CheckUserInterrupt is not
> >> > entirely free.  If you have a tight loop it might make more sense to check
> >> > at intervals rather than each time through and the choice of when to check
> >> > will depend very much on the computation.
> >> >
> >> > Here's an example using the inline package (code is at the bottom) with a
> >> > silly double loop to compare checking vs not:
> >> >
> >> > With R_CheckUserInterrupt inner loop:
> >> >
> >> >> system.time(z <- funx(10000L, TRUE))
> >> >   user  system elapsed
> >> >  1.310   0.001   1.313
> >> >
> >> > No checking:
> >> >
> >> > system.time(z <- funx(10000L, FALSE))
> >> >   user  system elapsed
> >> >   0.08    0.00    0.08
> >> >
> >> > Now this may be completely unrealistic, but I suspect that a sensible take
> >> > away message is that we should be making sure our C code that loops does
> >> > call R_CheckUserInterrupt, but depending on the computation, not at every
> >> > loop iteration.
> >> >
> >> > + seth
> >> >
> >> >
> >> > ## example code is below.
> >> >
> >> > library("inline")
> >> >
> >> > code <- "
> >> > int i, j, sum = 0, int n = INTEGER(max)[0];
> >> > int check_inter = LOGICAL(do_check)[0];
> >> > SEXP ans;
> >> > PROTECT(ans = allocVector(INTSXP, n));
> >> > for (i = 0; i < n; i++) {
> >> >    INTEGER(ans)[i] = i;
> >> >    for (j = 0; j < n; j++) {
> >> >        sum = i + j;
> >> >        if (check_inter) {
> >> >            R_CheckUserInterrupt();
> >> >        }
> >> >    }
> >> > }
> >> > UNPROTECT(1);
> >> > return ans;
> >> > "
> >> >
> >> > funx <- cfunction(signature(max="integer", do_check="logical"), code)
> >> >
> >> > system.time(z <- funx(10000L, TRUE))
> >> > system.time(z <- funx(10000L, FALSE))
> >> >
> >> >
> >> > --
> >> > Seth Falcon
> >> > Bioconductor Core Team | FHCRC
> >> >
> >> > _______________________________________________
> >> > Bioc-devel at stat.math.ethz.ch mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >
> >>
> >> _______________________________________________
> >> Bioc-devel at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>



More information about the Bioc-devel mailing list