[Rd] Resizing a named vector crashes R with gctorture(TRUE) (PR#13837)
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Jul 16 23:51:18 CEST 2009
On 16/07/2009 5:06 PM, Hervé Pagès wrote:
> Duncan Murdoch wrote:
>> On 7/16/2009 2:34 PM, Hervé Pagès wrote:
>>> Duncan Murdoch wrote:
>>>> On 15/07/2009 10:15 PM, Hervé Pagès wrote:
>>>>> I have to confess that I'm a little bit puzzled by how the
>>>>> PROTECT/UNPROTECT mechanism is used in the C code of R.
>>>>> Duncan, you say the problem you just fixed was an easy one.
>>>>> I looked at the C code too and was able to recognize a pattern
>>>>> that is indeed easy to identify as problematic:
>>>>>
>>>>> an unprotected call to allocVector() followed by a call
>>>>> that can trigger garbage collection (in that case another
>>>>> call to allocVector())
>>>>>
>>>>> It only took me 1 minute to find another occurrence of this pattern.
>>>>> It's in the do_grep() function (src/main/character.c, line 1168):
>>>>>
>>>>> > gctorture(TRUE)
>>>>> > grep("b", c(A="aa", B="aba"), value=TRUE)
>>>>> B
>>>>> "B"
>>>>>
>>>>> Given that the overhead of PROTECTing the SEXP returned by
>>>>> allocVector() can really be considered 0 (or almost), I'm
>>>>> wondering why this is not done in a more systematic way.
>>>> This is an explanation, not a justification:
>>>>
>>>> If you look at the history of that file, you'll see a hint: line
>>>> 1168 was written in 1998, the other lines were written later, by
>>>> other people. It is simply a matter of someone thinking something
>>>> was safe when it wasn't, and it's not clear who was wrong: it may
>>>> have been safe when written, but susceptible to later changes.
>>>>
>>>>> Even when nothing between PROTECT(allocVector()) and the
>>>>> corresponding UNPROTECT could trigger garbage collection
>>>>> (e.g. PROTECT(allocVector()) is close to the return statement).
>>>>> Because making exceptions like this can make your code
>>>>> really hard to maintain in the long term.
>>>> There are a lot of people who object to anything that slows R at all.
>>>> That puts pressure on anyone writing code to do it in a way that
>>>> wastes as few cycles as possible. That in turn makes it harder for
>>>> someone else to analyze the code. And overuse of PROTECT also makes
>>>> the code harder to read.
>>> Most of the calls to allocVector() are currently protected. There is a
>>> very small percentage of calls to allocVector() that are not. Most of
>>> the times because people apparently decided that, at the time they wrote
>>> the code, it didn't seem necessary. It doesn't matter if they were wrong
>>> or write. My point is that this game is not worth it.
>>>
>>> I bet if you protected all the calls to allocVector() you wouldn't notice
>>> any slow down in R. What is guaranteed though is that you end up with
>>> code
>>> that sooner or later will break because of some changes that are made to
>>> the function itself or to another function called by your function
>>> (because
>>> this other function is now calling gc and you were assuming that it
>>> wouldn't
>>> do that).
>>>
>>> And this kind of breakage is one of the worst kinds: if you are lucky,
>>> you get a segfault, but if you are not, you don't notice anything and
>>> get the wrong answer, like in the length<-() and grep() examples (and
>>> you can safely assume that there are many other places like this in R).
>>>
>>> I guess 99.999% of R users would happily trade a 0.001% slow down for
>>> a correct result.
>>>
>>> First make it right, then make it fast. And sorry, but you're not going
>>> to make it fast by saving a few calls to PROTECT() here and here.
>> As I said, I gave you an explanation, not a justification. I generally
>> agree with you, but not everyone does. For example, after posting the
>> first patch I received a private email suggesting that the following
>> PROTECT on xnames could be removed. I didn't remove it, because I think
>> it is mostly harmless, and it's not worth my time to analyze whether any
>> particular PROTECT is unneeded.
>>
>> I generally agree with you, but I don't totally agree with you. The
>> protection stack is not infinite, so any time you add a PROTECT you have
>> to be sure it will be removed in a relatively short time. You can't
>> have something like
>>
>> for (i=0; i < n; i++) { PROTECT( ans <- allocVector(...) ) ; ... }
>> UNPROTECT(n);
>>
>> because you are likely to blow the stack when n is large. You need the
>> UNPROTECTs within the loop, but still after ans stops being vulnerable
>> to garbage collection.
>
> Indeed, putting the UNPROTECT out of the loop would be a bad idea.
> The only reason I see people would do this is because they have
> some continue, break or return statements inside the loop and they
> don't want to put the UNPROTECT before each of them.
> But most of the times, it should be clear where to put the UNPROTECT,
> and, if this is not the case, then that means that it was not clear
> either that ans didn't need to be protected in the first place!
>
>> It's hard to place them automatically. And
>> every extra function/macro call adds to the obscurity of the code, so
>> it's harder to read it and know whether it really does what you wanted
>> it to. My inclination is to over-PROTECT things, but not to PROTECT
>> everything.
>
> I didn't say everything should be protected. Just that
> PROTECT(allocVector()) could be used in a more systematic way.
Tell me the system.
Duncan Murdoch
>
> Thanks,
> H.
>
>> Duncan Murdoch
>>
>>
>>> Cheers,
>>> H.
>>>
>>>
>>>> As an example: just below line 1168 there's another unprotected
>>>> allocVector of nm, but I think that one is safe, because it is
>>>> attached as an attribute to ans (which is now PROTECT'd) before
>>>> anything is done that could trigger gc. And a few lines below that,
>>>> on another branch of the if, another unprotected but safe-looking
>>>> allocation. Should I protect those? Then I'd also need to call
>>>> UNPROTECT again, or keep a counter of PROTECT calls, and the code
>>>> would be a little harder to read.
>>>>
>>>> Thanks for tracking down these two bugs; I'll fix the grep bug too.
>>>> If you feel like looking for more, it would be appreciated. (Writing
>>>> an automatic tool to analyze code and determine where new ones are
>>>> needed and where existing ones could be eliminated might be a fun
>>>> project, but there are too many fun projects.)
>>>>
>>>> Duncan Murdoch
>>>>
>>>>> Cheers,
>>>>> H.
>>>>>
>>>>>
>>>>> Hervé Pagès wrote:
>>>>>> murdoch at stats.uwo.ca wrote:
>>>>>>> On 15/07/2009 8:30 PM, murdoch at stats.uwo.ca wrote:
>>>>>>>> On 15/07/2009 8:08 PM, Hervé Pagès wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> > x <- c(a=10, b=20)
>>>>>>>>> > length(x) <- 1
>>>>>>>>> > x
>>>>>>>>> a
>>>>>>>>> 10
>>>>>>>>>
>>>>>>>>> But with gctorture turned on, I get:
>>>>>>>>>
>>>>>>>>> > gctorture(TRUE)
>>>>>>>>> > x <- c(a=10, b=20)
>>>>>>>>> > length(x) <- 1
>>>>>>>>> > x
>>>>>>>>> a
>>>>>>>>> "a" <---- ???
>>>>>>>>>
>>>>>>>>> > x <- c(a=10, b=20)
>>>>>>>>> > length(x) <- 3
>>>>>>>>>
>>>>>>>>> *** caught segfault ***
>>>>>>>>> address (nil), cause 'unknown'
>>>>>>>>>
>>>>>>>>> Possible actions:
>>>>>>>>> 1: abort (with core dump, if enabled)
>>>>>>>>> 2: normal R exit
>>>>>>>>> 3: exit R without saving workspace
>>>>>>>>> 4: exit R saving workspace
>>>>>>>>>
>>>>>>>>> This seems to have been around for a while (I get this with R 2.10,
>>>>>>>>> 2.9 and 2.8). Note that I don't get this with an unnamed vector.
>>>>>>>>>
>>>>>>>>> This problem affects the methods package. I found it while
>>>>>>>>> troubleshooting the "Protection stack overflow" I reported earlier
>>>>>>>>> (see https://stat.ethz.ch/pipermail/r-devel/2009-July/054030.html)
>>>>>>>>> but I can't tell yet whether the 2 issues are related or not.
>>>>>>>> That's clearly a bug (reproducible in today's R-devel build);
>>>>>>>> I've cc'd this reply to r-bugs. I'll take a look and see if I
>>>>>>>> can track it down.
>>>>>>> That's got to be the easiest low-level bug I've worked on in a
>>>>>>> while. Just a missing PROTECT. Now fixed, about to be committed
>>>>>>> to R-devel.
>>>>>> Thanks Duncan! And the "Protection stack overflow" issue that was
>>>>>> affecting
>>>>>> the methods package is gone now :)
>>>>>>
>>>>>> Cheers,
>>>>>> H.
>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>>>> It would be nice to see some reaction from the R developers
>>>>>>>>> about these issues. Thanks in advance!
>>>>>>>> You should post them as bug reports if they are as clearly bugs
>>>>>>>> as this one; otherwise they can easily get lost in the noise.
>>>>>>>> I'm not going to offer to look into the other one; I don't know
>>>>>>>> the insides of the methods package.
>>>>>>>>
>>>>>>>> Duncan Murdoch
>>>>>>>>
>>>>>>>>> H.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> hpages at fhcrc.org wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> > gctorture(TRUE)
>>>>>>>>>> > setGeneric("foo", function(x, y) standardGeneric("foo"))
>>>>>>>>>> [1] "foo"
>>>>>>>>>> > setMethod("foo", c("ANY", "ANY"),
>>>>>>>>>> + function(x, y) cat("calling foo,ANY,ANY method\n")
>>>>>>>>>> + )
>>>>>>>>>> Error: protect(): protection stack overflow
>>>>>>>>>>
>>>>>>>>>> Sorry this is something I already reported one week ago here
>>>>>>>>>> https://stat.ethz.ch/pipermail/r-devel/2009-July/053973.html
>>>>>>>>>> but I just had a 2nd look at it and realized that the problem
>>>>>>>>>> can in fact be reproduced out of the .onLoad() hook. So I'm
>>>>>>>>>> reporting it again with a different subject.
>>>>>>>>>>
>>>>>>>>>> See my sessionInfo() below. Thanks!
>>>>>>>>>> H.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> sessionInfo()
>>>>>>>>>> R version 2.10.0 Under development (unstable) (2009-06-26 r48837)
>>>>>>>>>> x86_64-unknown-linux-gnu
>>>>>>>>>>
>>>>>>>>>> locale:
>>>>>>>>>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
>>>>>>>>>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
>>>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8
>>>>>>>>>> [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
>>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>>>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>>
>>>>>>>>>> attached base packages:
>>>>>>>>>> [1] stats graphics grDevices utils datasets methods
>>>>>>>>>> base
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>> ______________________________________________
>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>> ______________________________________________
>>>>>>> R-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list