[Bioc-devel] PROTECT errors in Bioconductor packages

Gabe Becker becker.gabe at gene.com
Fri Apr 7 21:58:23 CEST 2017


On Fri, Apr 7, 2017 at 12:46 PM, <luke-tierney at uiowa.edu> wrote:

> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
> On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
>>
>>>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>
>>> >  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>> > >  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>> > >  <martin.morgan at roswellpark.org> wrote:
>>> > > >  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>> > > > > > > > > > >  The tool is not perfect, so assess each report
>>> carefully.
>>> > >  I get a lot of warnings because the tool seems to consider that
>>> >  extracting an attribute (with getAttrib(x, ...)) or extracting the
>>> >  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>> >  returns an SEXP that needs protection. I always assumed that it
>>> >  didn't because my understanding is that the returned SEXP is pointing
>>> >  to a part of a pre-existing object ('x') and not to a newly created
>>> >  one. So I decided I could treat it like the SEXP returned by
>>> >  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>> > >  So I suspect these warnings are false positives but I'm not 100%
>>> sure.
>>>
>>>  If you are not 100% sure then you should protect :-)
>>>
>>>  There are some cases, in particular related to compact row names on
>>>  data frames, where getAttrib will allocate.
>>>
>>
>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>> of compact list element? Why not keep these things low-level
>> getters/setters that return whatever the real thing is and use
>> higher-level accessors for returning the expanded version of the thing?
>>
>
> Seriously: it's been that way since r37807 in 2006.
>
> If you want to read about some related future directions you can look at
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.


Indeed. I was wondering whether to bring this up here.

In a (hopefully near future) version of R-devel, doing, e.g., INTEGER(x)
could allocate.  There is a way to ask it to give you NULL instead of
allocating if it would need to, but the point being it's probably going to
get much harder to safely be clever about avoiding PROTECT'ing. (Luke put
in temporary suspension of GC in some places, but I don't recall the exact
details of where that was used).

As a side note to the above, you'll need to do INTEGER(x) less often than
you did before. There will be a new INTEGER_ELT and INTEGER_GET_REGION
macros which (I think) will be guaranteed to not cause SEXP allocation.

In terms of why, at least in the ALTREP case, it's so that these things can
be passed directly to the R internals and be treated like whatever
lowl-level type of thing they are (e.g. numeric vector, string vector,
list, etc). This seamless backwards compatiblity requires that code which
doesn't use the INTEGER_ELT and INTEGER_GET_REGION (or analogues) macros
needs to have INTEGER(X) work and give it the pointer it expects, which
won't necessarily exist before the first time it is required.

Best,
~G


>
> luke
>
>
>
>
>> Thanks,
>> H.
>>
>>
>>>  Best,
>>>
>>>  luke
>>>
>>> > > > > > > > > > > > > >  I also get a warning on almost every C++
>>> function I've written,
>>> > > > >  because
>>> > > > >  I use the following code to handle exceptions:
>>> > > > > > > > >       SEXP output=PROTECT(allocVector(...));
>>> > > > >       try {
>>> > > > >           // do something that might raise an exception
>>> > > > >       } catch (std::exception& e) {
>>> > > > >           UNPROTECT(1);
>>> > > > >           throw; // break out of this part of the function
>>> > > > >       }
>>> > > > >       UNPROTECT(1);
>>> > > > >       return output;
>>> > > > > > > > >  Presumably the check doesn't account for transfer of
>>> control to > > > >  the
>>> > > > >  catch block. I find that R itself is pretty good at complaining
>>> > > > >  about
>>> > > > >  stack imbalances during execution of tests, examples, etc.
>>> > > > > > > > > >  'My' packages
>>> > > > > >  (Rsamtools, DirichletMultinomial) had several false positives
>>> > > > > >  (all
>>> > > > > >  associated with use of an attribute of a protected SEXP), one
>>> > > > > >  subtle
>>> > > > > >  problem (a symbol from a PROTECT'ed package name space; the >
>>> > > > >  symbol
>>> > > > > >  could
>>> > > > > >  in theory be an active binding and the value obtained not
>>> > > > > >  PROTECTed by
>>> > > > > >  the name space), and a genuine bug
>>> > > > > > > > > > >                  tag = NEW_CHARACTER(n);
>>> > > > > >                  for (int j = 0; j < n; ++j)
>>> > > > > >                      SET_STRING_ELT(tag, j, NA_STRING);
>>> > > > > >                  if ('A' == aux[0]) {
>>> > > > > >                      buf_A = R_alloc(2, sizeof(char));  # <<-
>>> bug
>>> > > > > >                      buf_A[1] = '\0';
>>> > > > > >                  }
>>> > > > > >                  ...
>>> > > > > >                  SET_VECTOR_ELT(tags, i, tag); # PROTECT tag,
>>> too
>>> > > > > >  late!
>>> > > > > > > > > > > > >  I assume the bug refers to the un-PROTECT'd
>>> NEW_CHARACTER here - > > > >  the
>>> > > > >  R_alloc call looks okay to me...
>>> > > > > > > > > >  yes, tag needs protection.
>>> > > > > > >  I attributed the bug to R_alloc because I failed to reason
>>> that > > >  R_alloc
>>> > > >  (obviously) allocates and hence can trigger a garbage collection.
>>> > > > > > >  Somehow it reflects my approach to PROTECTion, probably not
>>> shared > > >  by
>>> > > >  everyone. I like to PROTECT only when necessary, rather than
>>> > > >  indiscriminately. Probably this has no practical consequence in
>>> > > >  terms of
>>> > > >  performance, making the code a little easier to read at the
>>> expense > > >  of
>>> > > >  exposing me to bugs like this.
>>> > > > > > > >  I guess it's a tradeoff between syntactic complexity and
>>> logical
>>> > >  complexity. You have to think pretty hard to minimize use of the
>>> > >  protect stack.
>>> > >  I prefer to call it logical obscurity ;-)
>>> > >  The hard thinking consists in assessing whether or not the code
>>> between
>>> >  the line where a new SEXP is allocated (line X) and the line where
>>> >  it's put in a safe place (line Y) can trigger garbage collection.
>>> >  Hard to figure out in general indeed, but not impossible! Problem
>>> >  is that the result of this assessment is valid at a certain point
>>> >  in time but might change in the future, even if your code has not
>>> >  changed.
>>> > >  So a dangerous game for virtually zero benefits.
>>> > > > > >  One recommendation might be to UNPROTECT() as soon as the
>>> pointer on
>>> > >  the top is unneeded, rather than trying to figure out the number to
>>> > >  pop just before returning to R.
>>> > >  If you PROTECT() in a loop, you definitely want to do that.
>>> Otherwise,
>>> >  does it make a big difference?
>>> > > > > >  One thing that got me is that the order in which C evaluates
>>> function
>>> > >  call arguments is undefined. I did a lot of R_setAttrib(x,
>>> > >  install("foo"), allocBar()), thinking that the symbol would be
>>> > >  automatically protected, and allocBar() would not need protection,
>>> > >  since it happened last. Unfortunately, it might be evaluated first.
>>> > >  I got hit by this too long time ago but with defineVar() instead of
>>> >  R_setAttrib():
>>> > >   https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.
>>> html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_
>>> wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJO
>>> PdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
>>> > >  H.
>>> > > > > >  Btw, I think my package RGtk2 got the record: 1952 errors.
>>> Luckily
>>> > >  almost all of them happened inside a few macros and autogenerated
>>> > >  code.
>>> > > > > >  Martin
>>> > > > > > > > > > > >  Cheers,
>>> > > > > > > > >  Aaron
>>> > > > >  _______________________________________________
>>> > > > >  Bioc-devel at r-project.org mailing list
>>> > > > >  https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt
>>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su
>>> 20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BS
>>> jFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>> > > > > > > > > > > > > > > > > >  This email message may contain
>>> legally privileged
>>> > > >  and/or...{{dropped:2}}
>>> > > > > > >  _______________________________________________
>>> > > >  Bioc-devel at r-project.org mailing list
>>> > > >  https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt
>>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su
>>> 20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BS
>>> jFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>> > > > > > > >  _______________________________________________
>>> > >  Bioc-devel at r-project.org mailing list
>>> > >  https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt
>>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su
>>> 20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BS
>>> jFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>> > > > > > >
>>>
>>
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list