[Bioc-devel] PROTECT errors in Bioconductor packages

Aaron Lun alun at wehi.edu.au
Sun Apr 9 19:18:08 CEST 2017


Martin Morgan wrote:
> On 04/08/2017 12:29 PM, Aaron Lun wrote:
>> Martin Morgan wrote:
>>> On 04/08/2017 08:16 AM, Aaron Lun wrote:
>>>> On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
>>>>> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>>>
>>>>>> On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
>>>>>>>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>>>>>
>>>>>>>>  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>>>>>>>>  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>>>>>>>>  <martin.morgan at roswellpark.org> wrote:
>>>>>>>>>>  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>>>>>>>>>>>>>>>>  The tool is not perfect, so assess each report
>>>>>>> carefully.
>>>>>>>>>  I get a lot of warnings because the tool seems to consider that
>>>>>>>>  extracting an attribute (with getAttrib(x, ...)) or extracting
>>>>>>>> the
>>>>>>>>  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>>>>>>>  returns an SEXP that needs protection. I always assumed that it
>>>>>>>>  didn't because my understanding is that the returned SEXP is
>>>>>>>> pointing
>>>>>>>>  to a part of a pre-existing object ('x') and not to a newly
>>>>>>>> created
>>>>>>>>  one. So I decided I could treat it like the SEXP returned by
>>>>>>>>  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>>>>>>>>  So I suspect these warnings are false positives but I'm not 100%
>>>>>>> sure.
>>>>>>>
>>>>>>>  If you are not 100% sure then you should protect :-)
>>>>>>>
>>>>>>>  There are some cases, in particular related to compact row
>>>>>>> names on
>>>>>>>  data frames, where getAttrib will allocate.
>>>>>>
>>>>>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a
>>>>>> no-op
>>>>>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>>>>>> of compact list element? Why not keep these things low-level
>>>>>> getters/setters that return whatever the real thing is and use
>>>>>> higher-level accessors for returning the expanded version of the
>>>>>> thing?
>>>>>
>>>>> Seriously: it's    been that way since r37807 in 2006.
>>>>>
>>>>> If you want to read about some related future directions you can
>>>>> look at
>>>>> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.
>>>>>
>>>>> luke
>>>>
>>>> I was curious about this so I checked out what R-exts had to say
>>>> involving set/getAttrib. Funnily enough, the example it gives in
>>>> Section
>>>> 5.9.4 seems to be incorrect in its UNPROTECTing.
>>>>
>>>> #include <R.h>
>>>> #include <Rinternals.h>
>>>>
>>>> SEXP out(SEXP x, SEXP y)
>>>> {
>>>>      int nx = length(x), ny = length(y);
>>>>      SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
>>>>      double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
>>>>
>>>>      for(int i = 0; i < nx; i++) {
>>>>          double tmp = rx[i];
>>>>          for(int j = 0; j < ny; j++)
>>>>              rans[i + nx*j] = tmp * ry[j];
>>>>      }
>>>>
>>>>      SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
>>>>      SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
>>>>      SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
>>>>      setAttrib(ans, R_DimNamesSymbol, dimnames);
>>>>
>>>>
>>>>      UNPROTECT(3);
>>>>      return ans;
>>>> }
>>>>
>>>> There are two PROTECT calls but an UNPROTECT(3), which triggers a
>>>> stack
>>>> imbalance warning upon trying to run .Call("out", ...) in R.
>>>
>>> Yes, that should be UNPROTECT(2). svn blame says the error was
>>> introduced when allocMatrix() was introduced; prior to that the code
>>> had allocVector(), then set dim and dimnames.
>>>
>>> As for whether to PROTECT or not, my analysis would be...
>>>
>>> SET_VECTOR_ELT does not (currently) allocate (except on error) so
>>> there is no opportunity for the garbage collector to run, hence no
>>> need to PROTECT.
>>>
>>> Further, getAttrib() (currently) allocates only if (1) the attribute
>>> is R_RowNamesSymbol and the row names are stored in compact format
>>> c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
>>> or language SEXP. None of these conditions apply, so the return value
>>> of getAttrib() is PROTECTed anyway.
>>>
>>> Luke's analysis would be more straight-forward: if in doubt, PROTECT.
>>>
>>> I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
>>> maybe also note that my advice, in addition to being an analysis of
>>> some surprisingly complicated code by a practitioner of dubious
>>> credibility, involves the current state of affairs, and you never know
>>> (and apparently ALTREP makes it less certain) what the future will
>>> hold. So they'd probably say PROTECT.
>>>
>>> One might be tempted to write
>>>
>>>      SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
>>>
>>> but I'm not sure whether C guarantees that function arguments are
>>> fully evaluated, maybe it's legal for a compiler to evaluate
>>> getAttrib(), then do some more work with other arguments, then
>>> evaluate PROTECT(), so long as the overall logic is not disrupted. So
>>> the 'if in doubt' argument would make me write
>>>
>>>     SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
>>>     SET_VECTOR_ELT(dimnames, 0, nms);
>>>
>>> I think , in C is called a 'sequence point'. Google takes me to
>>>
>>>   https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
>>>
>>> where it seems like the left operand of ',' are fully evaluated before
>>> proceeding, and furthermore that arguments to functions are evaluated
>>> before entering the function, implying that it is safe to use the
>>> one-liner
>>>
>>>      SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
>>>
>>> At any rate I changed the example in R-exts to UNPROTECT(2), leaving
>>> the nuances for further discussion.
>>>
>>> Martin
>>
>> I wonder if the following is a sensible idea:
>>
>> int Rf_num_protected; // global variable
>> void Rf_start_protection() {
>>     Rf_num_protected=0;
>>     return;
>> }
>> SEXP Rf_add_protection(SEXP x) {
>>    ++Rf_num_protected;
>>     return PROTECT(x);
>> }
>> void Rf_end_protection() {
>>     UNPROTECT(Rf_num_protected);
>>     return;
>> }
>>
>> The idea would be to:
>>
>> 1. call Rf_start_protection() at the top of the native routine
>> 2. replace all uses of PROTECT with Rf_add_protection
>> 3. call Rf_end_protection() just before returning to R
>>
>> This would avoid having to keep track of the number of PROTECTs
>> performed, which may not be trivial if the routine can return at
>> multiple points.
>>
>> It might also useful for C++ native routine creating class instances
>> that need to do internal PROTECTs for the lifetime of the instance. As
>> long as those PROTECTs are done via Rf_add_protection(), a single
>> Rf_end_protection() call at the bottom of the top-level routine would be
>> sufficient to handle them all. In contrast, putting a matching UNPROTECT
>> in the class destructor is not safe, as it is possible to trigger the
>> destructor to UNPROTECT an unrelated SEXP:
>>
>> SEXP blah(SEXP x) {
>>     my_class* ptr=new my_class(x); // say this does an internal PROTECT
>>     SEXP output=PROTECT(allocVector(INTSXP, 1));
>>     // ... do something with output here...
>>     delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs
>> output instead
>>     // ... do some more stuff, possibly involving allocations ...
>>     UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class
>>     return output;
>> }
>>
>
> Global variables are problematic to reason about, e.g., in nested
> calls or parallel code sections.
>
> 'ad hoc' (no offense intended) solutions often increase rather than
> reduce cognitive burden, because someone new to the code (including
> one's future self) has to parse the intention and validate use.
>
> Rcpp seems like the right approach for C++ code; it largely removes
> the need for explicit PROTECTion management, and is widely used and
> responsibly maintained so the edge cases / tricky problems get
> discovered and addressed.
>
> Martin

Yes, I was wondering whether I should transition all of my C++ code to
Rcpp. It would require a decent amount of effort involving ~7 packages,
so I've held off from doing it... but if I have to go through the code
to fix all of the PROTECTs anyway, I might as well go all the way and
switch to using Rcpp. Sounds like a small project for my Easter break.

-Aaron

>>>>
>>>> Anyway, getting back to the topic of this thread; if we were to
>>>> pretend
>>>> that getAttrib() allocates in the above example, would that mean that
>>>> both getAttrib() calls now need to be PROTECTed by the developer?
>>>> Or is
>>>> this handled somewhere internally?
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> H.
>>>>>>
>>>>>>>
>>>>>>>  Best,
>>>>>>>
>>>>>>>  luke
>>>>>>>
>>>>>>>>>>>>>>>>>>>>  I also get a warning on almost every C++
>>>>>>> function I've written,
>>>>>>>>>>>  because
>>>>>>>>>>>  I use the following code to handle exceptions:
>>>>>>>>>>>>>>>       SEXP output=PROTECT(allocVector(...));
>>>>>>>>>>>       try {
>>>>>>>>>>>           // do something that might raise an exception
>>>>>>>>>>>       } catch (std::exception& e) {
>>>>>>>>>>>           UNPROTECT(1);
>>>>>>>>>>>           throw; // break out of this part of the function
>>>>>>>>>>>       }
>>>>>>>>>>>       UNPROTECT(1);
>>>>>>>>>>>       return output;
>>>>>>>>>>>>>>>  Presumably the check doesn't account for transfer of
>>>>>>> control to > > > >  the
>>>>>>>>>>>  catch block. I find that R itself is pretty good at
>>>>>>> complaining > > > >  about
>>>>>>>>>>>  stack imbalances during execution of tests, examples, etc.
>>>>>>>>>>>>>>>>  'My' packages
>>>>>>>>>>>>  (Rsamtools, DirichletMultinomial) had several false
>>>>>>> positives > > > > >  (all
>>>>>>>>>>>>  associated with use of an attribute of a protected SEXP),
>>>>>>> one > > > > >  subtle
>>>>>>>>>>>>  problem (a symbol from a PROTECT'ed package name space;
>>>>>>> the > > > > >  symbol
>>>>>>>>>>>>  could
>>>>>>>>>>>>  in theory be an active binding and the value obtained not
>>>>>>>>>>>>  PROTECTed by
>>>>>>>>>>>>  the name space), and a genuine bug
>>>>>>>>>>>>>>>>>                  tag = NEW_CHARACTER(n);
>>>>>>>>>>>>                  for (int j = 0; j < n; ++j)
>>>>>>>>>>>>                      SET_STRING_ELT(tag, j, NA_STRING);
>>>>>>>>>>>>                  if ('A' == aux[0]) {
>>>>>>>>>>>>                      buf_A = R_alloc(2, sizeof(char));  #
>>>>>>> <<- bug
>>>>>>>>>>>>                      buf_A[1] = '\0';
>>>>>>>>>>>>                  }
>>>>>>>>>>>>                  ...
>>>>>>>>>>>>                  SET_VECTOR_ELT(tags, i, tag); # PROTECT
>>>>>>> tag, too
>>>>>>>>>>>>  late!
>>>>>>>>>>>>>>>>>>>  I assume the bug refers to the un-PROTECT'd
>>>>>>> NEW_CHARACTER here - > > > >  the
>>>>>>>>>>>  R_alloc call looks okay to me...
>>>>>>>>>>>>>>>>  yes, tag needs protection.
>>>>>>>>>>>>>  I attributed the bug to R_alloc because I failed to
>>>>>>> reason that > > >  R_alloc
>>>>>>>>>>  (obviously) allocates and hence can trigger a garbage
>>>>>>>>>> collection.
>>>>>>>>>>>>>  Somehow it reflects my approach to PROTECTion, probably
>>>>>>> not shared > > >  by
>>>>>>>>>>  everyone. I like to PROTECT only when necessary, rather than
>>>>>>>>>>  indiscriminately. Probably this has no practical consequence in
>>>>>>>>>>  terms of
>>>>>>>>>>  performance, making the code a little easier to read at the
>>>>>>> expense > > >  of
>>>>>>>>>>  exposing me to bugs like this.
>>>>>>>>>>>>>>  I guess it's a tradeoff between syntactic complexity
>>>>>>> and logical
>>>>>>>>>  complexity. You have to think pretty hard to minimize use of the
>>>>>>>>>  protect stack.
>>>>>>>>>  I prefer to call it logical obscurity ;-)
>>>>>>>>>  The hard thinking consists in assessing whether or not the code
>>>>>>> between
>>>>>>>>  the line where a new SEXP is allocated (line X) and the line
>>>>>>>> where
>>>>>>>>  it's put in a safe place (line Y) can trigger garbage collection.
>>>>>>>>  Hard to figure out in general indeed, but not impossible! Problem
>>>>>>>>  is that the result of this assessment is valid at a certain point
>>>>>>>>  in time but might change in the future, even if your code has not
>>>>>>>>  changed.
>>>>>>>>>  So a dangerous game for virtually zero benefits.
>>>>>>>>>>>>  One recommendation might be to UNPROTECT() as soon as the
>>>>>>> pointer on
>>>>>>>>>  the top is unneeded, rather than trying to figure out the
>>>>>>>>> number to
>>>>>>>>>  pop just before returning to R.
>>>>>>>>>  If you PROTECT() in a loop, you definitely want to do that.
>>>>>>> Otherwise,
>>>>>>>>  does it make a big difference?
>>>>>>>>>>>>  One thing that got me is that the order in which C
>>>>>>> evaluates function
>>>>>>>>>  call arguments is undefined. I did a lot of R_setAttrib(x,
>>>>>>>>>  install("foo"), allocBar()), thinking that the symbol would be
>>>>>>>>>  automatically protected, and allocBar() would not need
>>>>>>>>> protection,
>>>>>>>>>  since it happened last. Unfortunately, it might be evaluated
>>>>>>>>> first.
>>>>>>>>>  I got hit by this too long time ago but with defineVar()
>>>>>>>>> instead of
>>>>>>>>  R_setAttrib():
>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>  H.
>>>>>>>>>>>>  Btw, I think my package RGtk2 got the record: 1952 errors.
>>>>>>> Luckily
>>>>>>>>>  almost all of them happened inside a few macros and
>>>>>>>>> autogenerated
>>>>>>>>>  code.
>>>>>>>>>>>>  Martin
>>>>>>>>>>>>>>>>>>  Cheers,
>>>>>>>>>>>>>>>  Aaron
>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>  Bioc-devel at r-project.org mailing list
>>>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  This email message may contain
>>>>>>> legally privileged
>>>>>>>>>>  and/or...{{dropped:2}}
>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>  Bioc-devel at r-project.org mailing list
>>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>  Bioc-devel at r-project.org mailing list
>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>> This email message may contain legally privileged and/or confidential
>>> information.  If you are not the intended recipient(s), or the
>>> employee or agent responsible for the delivery of this message to the
>>> intended recipient(s), you are hereby notified that any disclosure,
>>> copying, distribution, or use of this email message is prohibited.  If
>>> you have received this message in error, please notify the sender
>>> immediately by e-mail and delete this email message from your
>>> computer. Thank you.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that any disclosure,
> copying, distribution, or use of this email message is prohibited.  If
> you have received this message in error, please notify the sender
> immediately by e-mail and delete this email message from your
> computer. Thank you.



More information about the Bioc-devel mailing list