[Bioc-devel] PROTECT errors in Bioconductor packages

luke-tierney at uiowa.edu luke-tierney at uiowa.edu
Fri Apr 7 14:37:31 CEST 2017


On Fri, 7 Apr 2017, Hervé Pagès wrote:

> On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>> On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>> <martin.morgan at roswellpark.org> wrote:
>>> On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>>>> 
>>>>> The tool is not perfect, so assess each report carefully.
>
> I get a lot of warnings because the tool seems to consider that
> extracting an attribute (with getAttrib(x, ...)) or extracting the
> slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
> returns an SEXP that needs protection. I always assumed that it
> didn't because my understanding is that the returned SEXP is pointing
> to a part of a pre-existing object ('x') and not to a newly created
> one. So I decided I could treat it like the SEXP returned by
> VECTOR_ELT(), which, AFAIK, doesn't need protection.
>
> So I suspect these warnings are false positives but I'm not 100% sure.

If you are not 100% sure then you should protect :-)

There are some cases, in particular related to compact row names on
data frames, where getAttrib will allocate.

Best,

luke

>
>>>> 
>>>> 
>>>> I also get a warning on almost every C++ function I've written, because
>>>> I use the following code to handle exceptions:
>>>>
>>>>      SEXP output=PROTECT(allocVector(...));
>>>>      try {
>>>>          // do something that might raise an exception
>>>>      } catch (std::exception& e) {
>>>>          UNPROTECT(1);
>>>>          throw; // break out of this part of the function
>>>>      }
>>>>      UNPROTECT(1);
>>>>      return output;
>>>> 
>>>> Presumably the check doesn't account for transfer of control to the
>>>> catch block. I find that R itself is pretty good at complaining about
>>>> stack imbalances during execution of tests, examples, etc.
>>>> 
>>>>> 'My' packages
>>>>> (Rsamtools, DirichletMultinomial) had several false positives (all
>>>>> associated with use of an attribute of a protected SEXP), one subtle
>>>>> problem (a symbol from a PROTECT'ed package name space; the symbol could
>>>>> in theory be an active binding and the value obtained not PROTECTed by
>>>>> the name space), and a genuine bug
>>>>>
>>>>>                 tag = NEW_CHARACTER(n);
>>>>>                 for (int j = 0; j < n; ++j)
>>>>>                     SET_STRING_ELT(tag, j, NA_STRING);
>>>>>                 if ('A' == aux[0]) {
>>>>>                     buf_A = R_alloc(2, sizeof(char));  # <<- bug
>>>>>                     buf_A[1] = '\0';
>>>>>                 }
>>>>>                 ...
>>>>>                 SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
>>>> 
>>>> 
>>>> I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
>>>> R_alloc call looks okay to me...
>>> 
>>> 
>>> yes, tag needs protection.
>>> 
>>> I attributed the bug to R_alloc because I failed to reason that R_alloc
>>> (obviously) allocates and hence can trigger a garbage collection.
>>> 
>>> Somehow it reflects my approach to PROTECTion, probably not shared by
>>> everyone. I like to PROTECT only when necessary, rather than
>>> indiscriminately. Probably this has no practical consequence in terms of
>>> performance, making the code a little easier to read at the expense of
>>> exposing me to bugs like this.
>>> 
>> 
>> I guess it's a tradeoff between syntactic complexity and logical
>> complexity. You have to think pretty hard to minimize use of the
>> protect stack.
>
> I prefer to call it logical obscurity ;-)
>
> The hard thinking consists in assessing whether or not the code between
> the line where a new SEXP is allocated (line X) and the line where
> it's put in a safe place (line Y) can trigger garbage collection.
> Hard to figure out in general indeed, but not impossible! Problem
> is that the result of this assessment is valid at a certain point
> in time but might change in the future, even if your code has not
> changed.
>
> So a dangerous game for virtually zero benefits.
>
>> 
>> One recommendation might be to UNPROTECT() as soon as the pointer on
>> the top is unneeded, rather than trying to figure out the number to
>> pop just before returning to R.
>
> If you PROTECT() in a loop, you definitely want to do that. Otherwise,
> does it make a big difference?
>
>> 
>> One thing that got me is that the order in which C evaluates function
>> call arguments is undefined. I did a lot of R_setAttrib(x,
>> install("foo"), allocBar()), thinking that the symbol would be
>> automatically protected, and allocBar() would not need protection,
>> since it happened last. Unfortunately, it might be evaluated first.
>
> I got hit by this too long time ago but with defineVar() instead of
> R_setAttrib():
>
>  https://stat.ethz.ch/pipermail/r-devel/2008-January/048040.html
>
> H.
>
>> 
>> Btw, I think my package RGtk2 got the record: 1952 errors. Luckily
>> almost all of them happened inside a few macros and autogenerated
>> code.
>> 
>>> Martin
>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Aaron
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>> 
>>> 
>>> 
>>> This email message may contain legally privileged and/or...{{dropped:2}}
>>> 
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>> 
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>> 
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the Bioc-devel mailing list