[Bioc-devel] PROTECT errors in Bioconductor packages
Aaron Lun
alun at wehi.edu.au
Sun Apr 9 19:18:08 CEST 2017
Martin Morgan wrote:
> On 04/08/2017 12:29 PM, Aaron Lun wrote:
>> Martin Morgan wrote:
>>> On 04/08/2017 08:16 AM, Aaron Lun wrote:
>>>> On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
>>>>> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>>>
>>>>>> On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
>>>>>>> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>>>>>
>>>>>>>> On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>>>>>>>> On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>>>>>>>> <martin.morgan at roswellpark.org> wrote:
>>>>>>>>>> On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>>>>>>>>>>>>>>>> The tool is not perfect, so assess each report
>>>>>>> carefully.
>>>>>>>>> I get a lot of warnings because the tool seems to consider that
>>>>>>>> extracting an attribute (with getAttrib(x, ...)) or extracting
>>>>>>>> the
>>>>>>>> slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>>>>>>> returns an SEXP that needs protection. I always assumed that it
>>>>>>>> didn't because my understanding is that the returned SEXP is
>>>>>>>> pointing
>>>>>>>> to a part of a pre-existing object ('x') and not to a newly
>>>>>>>> created
>>>>>>>> one. So I decided I could treat it like the SEXP returned by
>>>>>>>> VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>>>>>>>> So I suspect these warnings are false positives but I'm not 100%
>>>>>>> sure.
>>>>>>>
>>>>>>> If you are not 100% sure then you should protect :-)
>>>>>>>
>>>>>>> There are some cases, in particular related to compact row
>>>>>>> names on
>>>>>>> data frames, where getAttrib will allocate.
>>>>>>
>>>>>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a
>>>>>> no-op
>>>>>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>>>>>> of compact list element? Why not keep these things low-level
>>>>>> getters/setters that return whatever the real thing is and use
>>>>>> higher-level accessors for returning the expanded version of the
>>>>>> thing?
>>>>>
>>>>> Seriously: it's been that way since r37807 in 2006.
>>>>>
>>>>> If you want to read about some related future directions you can
>>>>> look at
>>>>> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.
>>>>>
>>>>> luke
>>>>
>>>> I was curious about this so I checked out what R-exts had to say
>>>> involving set/getAttrib. Funnily enough, the example it gives in
>>>> Section
>>>> 5.9.4 seems to be incorrect in its UNPROTECTing.
>>>>
>>>> #include <R.h>
>>>> #include <Rinternals.h>
>>>>
>>>> SEXP out(SEXP x, SEXP y)
>>>> {
>>>> int nx = length(x), ny = length(y);
>>>> SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
>>>> double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
>>>>
>>>> for(int i = 0; i < nx; i++) {
>>>> double tmp = rx[i];
>>>> for(int j = 0; j < ny; j++)
>>>> rans[i + nx*j] = tmp * ry[j];
>>>> }
>>>>
>>>> SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
>>>> SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
>>>> SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
>>>> setAttrib(ans, R_DimNamesSymbol, dimnames);
>>>>
>>>>
>>>> UNPROTECT(3);
>>>> return ans;
>>>> }
>>>>
>>>> There are two PROTECT calls but an UNPROTECT(3), which triggers a
>>>> stack
>>>> imbalance warning upon trying to run .Call("out", ...) in R.
>>>
>>> Yes, that should be UNPROTECT(2). svn blame says the error was
>>> introduced when allocMatrix() was introduced; prior to that the code
>>> had allocVector(), then set dim and dimnames.
>>>
>>> As for whether to PROTECT or not, my analysis would be...
>>>
>>> SET_VECTOR_ELT does not (currently) allocate (except on error) so
>>> there is no opportunity for the garbage collector to run, hence no
>>> need to PROTECT.
>>>
>>> Further, getAttrib() (currently) allocates only if (1) the attribute
>>> is R_RowNamesSymbol and the row names are stored in compact format
>>> c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
>>> or language SEXP. None of these conditions apply, so the return value
>>> of getAttrib() is PROTECTed anyway.
>>>
>>> Luke's analysis would be more straight-forward: if in doubt, PROTECT.
>>>
>>> I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
>>> maybe also note that my advice, in addition to being an analysis of
>>> some surprisingly complicated code by a practitioner of dubious
>>> credibility, involves the current state of affairs, and you never know
>>> (and apparently ALTREP makes it less certain) what the future will
>>> hold. So they'd probably say PROTECT.
>>>
>>> One might be tempted to write
>>>
>>> SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
>>>
>>> but I'm not sure whether C guarantees that function arguments are
>>> fully evaluated, maybe it's legal for a compiler to evaluate
>>> getAttrib(), then do some more work with other arguments, then
>>> evaluate PROTECT(), so long as the overall logic is not disrupted. So
>>> the 'if in doubt' argument would make me write
>>>
>>> SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
>>> SET_VECTOR_ELT(dimnames, 0, nms);
>>>
>>> I think , in C is called a 'sequence point'. Google takes me to
>>>
>>> https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
>>>
>>> where it seems like the left operand of ',' are fully evaluated before
>>> proceeding, and furthermore that arguments to functions are evaluated
>>> before entering the function, implying that it is safe to use the
>>> one-liner
>>>
>>> SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
>>>
>>> At any rate I changed the example in R-exts to UNPROTECT(2), leaving
>>> the nuances for further discussion.
>>>
>>> Martin
>>
>> I wonder if the following is a sensible idea:
>>
>> int Rf_num_protected; // global variable
>> void Rf_start_protection() {
>> Rf_num_protected=0;
>> return;
>> }
>> SEXP Rf_add_protection(SEXP x) {
>> ++Rf_num_protected;
>> return PROTECT(x);
>> }
>> void Rf_end_protection() {
>> UNPROTECT(Rf_num_protected);
>> return;
>> }
>>
>> The idea would be to:
>>
>> 1. call Rf_start_protection() at the top of the native routine
>> 2. replace all uses of PROTECT with Rf_add_protection
>> 3. call Rf_end_protection() just before returning to R
>>
>> This would avoid having to keep track of the number of PROTECTs
>> performed, which may not be trivial if the routine can return at
>> multiple points.
>>
>> It might also useful for C++ native routine creating class instances
>> that need to do internal PROTECTs for the lifetime of the instance. As
>> long as those PROTECTs are done via Rf_add_protection(), a single
>> Rf_end_protection() call at the bottom of the top-level routine would be
>> sufficient to handle them all. In contrast, putting a matching UNPROTECT
>> in the class destructor is not safe, as it is possible to trigger the
>> destructor to UNPROTECT an unrelated SEXP:
>>
>> SEXP blah(SEXP x) {
>> my_class* ptr=new my_class(x); // say this does an internal PROTECT
>> SEXP output=PROTECT(allocVector(INTSXP, 1));
>> // ... do something with output here...
>> delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs
>> output instead
>> // ... do some more stuff, possibly involving allocations ...
>> UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class
>> return output;
>> }
>>
>
> Global variables are problematic to reason about, e.g., in nested
> calls or parallel code sections.
>
> 'ad hoc' (no offense intended) solutions often increase rather than
> reduce cognitive burden, because someone new to the code (including
> one's future self) has to parse the intention and validate use.
>
> Rcpp seems like the right approach for C++ code; it largely removes
> the need for explicit PROTECTion management, and is widely used and
> responsibly maintained so the edge cases / tricky problems get
> discovered and addressed.
>
> Martin
Yes, I was wondering whether I should transition all of my C++ code to
Rcpp. It would require a decent amount of effort involving ~7 packages,
so I've held off from doing it... but if I have to go through the code
to fix all of the PROTECTs anyway, I might as well go all the way and
switch to using Rcpp. Sounds like a small project for my Easter break.
-Aaron
>>>>
>>>> Anyway, getting back to the topic of this thread; if we were to
>>>> pretend
>>>> that getAttrib() allocates in the above example, would that mean that
>>>> both getAttrib() calls now need to be PROTECTed by the developer?
>>>> Or is
>>>> this handled somewhere internally?
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> H.
>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> luke
>>>>>>>
>>>>>>>>>>>>>>>>>>>> I also get a warning on almost every C++
>>>>>>> function I've written,
>>>>>>>>>>> because
>>>>>>>>>>> I use the following code to handle exceptions:
>>>>>>>>>>>>>>> SEXP output=PROTECT(allocVector(...));
>>>>>>>>>>> try {
>>>>>>>>>>> // do something that might raise an exception
>>>>>>>>>>> } catch (std::exception& e) {
>>>>>>>>>>> UNPROTECT(1);
>>>>>>>>>>> throw; // break out of this part of the function
>>>>>>>>>>> }
>>>>>>>>>>> UNPROTECT(1);
>>>>>>>>>>> return output;
>>>>>>>>>>>>>>> Presumably the check doesn't account for transfer of
>>>>>>> control to > > > > the
>>>>>>>>>>> catch block. I find that R itself is pretty good at
>>>>>>> complaining > > > > about
>>>>>>>>>>> stack imbalances during execution of tests, examples, etc.
>>>>>>>>>>>>>>>> 'My' packages
>>>>>>>>>>>> (Rsamtools, DirichletMultinomial) had several false
>>>>>>> positives > > > > > (all
>>>>>>>>>>>> associated with use of an attribute of a protected SEXP),
>>>>>>> one > > > > > subtle
>>>>>>>>>>>> problem (a symbol from a PROTECT'ed package name space;
>>>>>>> the > > > > > symbol
>>>>>>>>>>>> could
>>>>>>>>>>>> in theory be an active binding and the value obtained not
>>>>>>>>>>>> PROTECTed by
>>>>>>>>>>>> the name space), and a genuine bug
>>>>>>>>>>>>>>>>> tag = NEW_CHARACTER(n);
>>>>>>>>>>>> for (int j = 0; j < n; ++j)
>>>>>>>>>>>> SET_STRING_ELT(tag, j, NA_STRING);
>>>>>>>>>>>> if ('A' == aux[0]) {
>>>>>>>>>>>> buf_A = R_alloc(2, sizeof(char)); #
>>>>>>> <<- bug
>>>>>>>>>>>> buf_A[1] = '\0';
>>>>>>>>>>>> }
>>>>>>>>>>>> ...
>>>>>>>>>>>> SET_VECTOR_ELT(tags, i, tag); # PROTECT
>>>>>>> tag, too
>>>>>>>>>>>> late!
>>>>>>>>>>>>>>>>>>> I assume the bug refers to the un-PROTECT'd
>>>>>>> NEW_CHARACTER here - > > > > the
>>>>>>>>>>> R_alloc call looks okay to me...
>>>>>>>>>>>>>>>> yes, tag needs protection.
>>>>>>>>>>>>> I attributed the bug to R_alloc because I failed to
>>>>>>> reason that > > > R_alloc
>>>>>>>>>> (obviously) allocates and hence can trigger a garbage
>>>>>>>>>> collection.
>>>>>>>>>>>>> Somehow it reflects my approach to PROTECTion, probably
>>>>>>> not shared > > > by
>>>>>>>>>> everyone. I like to PROTECT only when necessary, rather than
>>>>>>>>>> indiscriminately. Probably this has no practical consequence in
>>>>>>>>>> terms of
>>>>>>>>>> performance, making the code a little easier to read at the
>>>>>>> expense > > > of
>>>>>>>>>> exposing me to bugs like this.
>>>>>>>>>>>>>> I guess it's a tradeoff between syntactic complexity
>>>>>>> and logical
>>>>>>>>> complexity. You have to think pretty hard to minimize use of the
>>>>>>>>> protect stack.
>>>>>>>>> I prefer to call it logical obscurity ;-)
>>>>>>>>> The hard thinking consists in assessing whether or not the code
>>>>>>> between
>>>>>>>> the line where a new SEXP is allocated (line X) and the line
>>>>>>>> where
>>>>>>>> it's put in a safe place (line Y) can trigger garbage collection.
>>>>>>>> Hard to figure out in general indeed, but not impossible! Problem
>>>>>>>> is that the result of this assessment is valid at a certain point
>>>>>>>> in time but might change in the future, even if your code has not
>>>>>>>> changed.
>>>>>>>>> So a dangerous game for virtually zero benefits.
>>>>>>>>>>>> One recommendation might be to UNPROTECT() as soon as the
>>>>>>> pointer on
>>>>>>>>> the top is unneeded, rather than trying to figure out the
>>>>>>>>> number to
>>>>>>>>> pop just before returning to R.
>>>>>>>>> If you PROTECT() in a loop, you definitely want to do that.
>>>>>>> Otherwise,
>>>>>>>> does it make a big difference?
>>>>>>>>>>>> One thing that got me is that the order in which C
>>>>>>> evaluates function
>>>>>>>>> call arguments is undefined. I did a lot of R_setAttrib(x,
>>>>>>>>> install("foo"), allocBar()), thinking that the symbol would be
>>>>>>>>> automatically protected, and allocBar() would not need
>>>>>>>>> protection,
>>>>>>>>> since it happened last. Unfortunately, it might be evaluated
>>>>>>>>> first.
>>>>>>>>> I got hit by this too long time ago but with defineVar()
>>>>>>>>> instead of
>>>>>>>> R_setAttrib():
>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> H.
>>>>>>>>>>>> Btw, I think my package RGtk2 got the record: 1952 errors.
>>>>>>> Luckily
>>>>>>>>> almost all of them happened inside a few macros and
>>>>>>>>> autogenerated
>>>>>>>>> code.
>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Aaron
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> This email message may contain
>>>>>>> legally privileged
>>>>>>>>>> and/or...{{dropped:2}}
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>> This email message may contain legally privileged and/or confidential
>>> information. If you are not the intended recipient(s), or the
>>> employee or agent responsible for the delivery of this message to the
>>> intended recipient(s), you are hereby notified that any disclosure,
>>> copying, distribution, or use of this email message is prohibited. If
>>> you have received this message in error, please notify the sender
>>> immediately by e-mail and delete this email message from your
>>> computer. Thank you.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that any disclosure,
> copying, distribution, or use of this email message is prohibited. If
> you have received this message in error, please notify the sender
> immediately by e-mail and delete this email message from your
> computer. Thank you.
More information about the Bioc-devel
mailing list