[Bioc-devel] PROTECT errors in Bioconductor packages

Hervé Pagès hpages at fredhutch.org
Fri Apr 7 19:34:58 CEST 2017


On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
>> On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>> On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>> <martin.morgan at roswellpark.org> wrote:
>>>> On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>>>>>
>>>>>> The tool is not perfect, so assess each report carefully.
>>
>> I get a lot of warnings because the tool seems to consider that
>> extracting an attribute (with getAttrib(x, ...)) or extracting the
>> slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>> returns an SEXP that needs protection. I always assumed that it
>> didn't because my understanding is that the returned SEXP is pointing
>> to a part of a pre-existing object ('x') and not to a newly created
>> one. So I decided I could treat it like the SEXP returned by
>> VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>
>> So I suspect these warnings are false positives but I'm not 100% sure.
>
> If you are not 100% sure then you should protect :-)
>
> There are some cases, in particular related to compact row names on
> data frames, where getAttrib will allocate.

Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
anymore? Should I worry that VECTOR_ELT() will also expand some sort
of compact list element? Why not keep these things low-level
getters/setters that return whatever the real thing is and use
higher-level accessors for returning the expanded version of the thing?

Thanks,
H.

>
> Best,
>
> luke
>
>>
>>>>>
>>>>>
>>>>> I also get a warning on almost every C++ function I've written,
>>>>> because
>>>>> I use the following code to handle exceptions:
>>>>>
>>>>>      SEXP output=PROTECT(allocVector(...));
>>>>>      try {
>>>>>          // do something that might raise an exception
>>>>>      } catch (std::exception& e) {
>>>>>          UNPROTECT(1);
>>>>>          throw; // break out of this part of the function
>>>>>      }
>>>>>      UNPROTECT(1);
>>>>>      return output;
>>>>>
>>>>> Presumably the check doesn't account for transfer of control to the
>>>>> catch block. I find that R itself is pretty good at complaining about
>>>>> stack imbalances during execution of tests, examples, etc.
>>>>>
>>>>>> 'My' packages
>>>>>> (Rsamtools, DirichletMultinomial) had several false positives (all
>>>>>> associated with use of an attribute of a protected SEXP), one subtle
>>>>>> problem (a symbol from a PROTECT'ed package name space; the symbol
>>>>>> could
>>>>>> in theory be an active binding and the value obtained not
>>>>>> PROTECTed by
>>>>>> the name space), and a genuine bug
>>>>>>
>>>>>>                 tag = NEW_CHARACTER(n);
>>>>>>                 for (int j = 0; j < n; ++j)
>>>>>>                     SET_STRING_ELT(tag, j, NA_STRING);
>>>>>>                 if ('A' == aux[0]) {
>>>>>>                     buf_A = R_alloc(2, sizeof(char));  # <<- bug
>>>>>>                     buf_A[1] = '\0';
>>>>>>                 }
>>>>>>                 ...
>>>>>>                 SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too
>>>>>> late!
>>>>>
>>>>>
>>>>> I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
>>>>> R_alloc call looks okay to me...
>>>>
>>>>
>>>> yes, tag needs protection.
>>>>
>>>> I attributed the bug to R_alloc because I failed to reason that R_alloc
>>>> (obviously) allocates and hence can trigger a garbage collection.
>>>>
>>>> Somehow it reflects my approach to PROTECTion, probably not shared by
>>>> everyone. I like to PROTECT only when necessary, rather than
>>>> indiscriminately. Probably this has no practical consequence in
>>>> terms of
>>>> performance, making the code a little easier to read at the expense of
>>>> exposing me to bugs like this.
>>>>
>>>
>>> I guess it's a tradeoff between syntactic complexity and logical
>>> complexity. You have to think pretty hard to minimize use of the
>>> protect stack.
>>
>> I prefer to call it logical obscurity ;-)
>>
>> The hard thinking consists in assessing whether or not the code between
>> the line where a new SEXP is allocated (line X) and the line where
>> it's put in a safe place (line Y) can trigger garbage collection.
>> Hard to figure out in general indeed, but not impossible! Problem
>> is that the result of this assessment is valid at a certain point
>> in time but might change in the future, even if your code has not
>> changed.
>>
>> So a dangerous game for virtually zero benefits.
>>
>>>
>>> One recommendation might be to UNPROTECT() as soon as the pointer on
>>> the top is unneeded, rather than trying to figure out the number to
>>> pop just before returning to R.
>>
>> If you PROTECT() in a loop, you definitely want to do that. Otherwise,
>> does it make a big difference?
>>
>>>
>>> One thing that got me is that the order in which C evaluates function
>>> call arguments is undefined. I did a lot of R_setAttrib(x,
>>> install("foo"), allocBar()), thinking that the symbol would be
>>> automatically protected, and allocBar() would not need protection,
>>> since it happened last. Unfortunately, it might be evaluated first.
>>
>> I got hit by this too long time ago but with defineVar() instead of
>> R_setAttrib():
>>
>>  https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
>>
>> H.
>>
>>>
>>> Btw, I think my package RGtk2 got the record: 1952 errors. Luckily
>>> almost all of them happened inside a few macros and autogenerated
>>> code.
>>>
>>>> Martin
>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Aaron
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>
>>>>>
>>>>
>>>>
>>>> This email message may contain legally privileged
>>>> and/or...{{dropped:2}}
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>
>>>
>>
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list