[Rd] Resizing a named vector crashes	R	with	gctorture(TRUE)	(PR#13837)
    Duncan Murdoch 
    murdoch at stats.uwo.ca
       
    Thu Jul 16 12:12:33 CEST 2009
    
    
  
On 15/07/2009 10:15 PM, Hervé Pagès wrote:
> I have to confess that I'm a little bit puzzled by how the
> PROTECT/UNPROTECT mechanism is used in the C code of R.
> Duncan, you say the problem you just fixed was an easy one.
> I looked at the C code too and was able to recognize a pattern
> that is indeed easy to identify as problematic:
> 
>    an unprotected call to allocVector() followed by a call
>    that can trigger garbage collection (in that case another
>    call to allocVector())
> 
> It only took me 1 minute to find another occurrence of this pattern.
> It's in the do_grep() function (src/main/character.c, line 1168):
> 
>    > gctorture(TRUE)
>    > grep("b", c(A="aa", B="aba"), value=TRUE)
>      B
>    "B"
> 
> Given that the overhead of PROTECTing the SEXP returned by
> allocVector() can really be considered 0 (or almost), I'm
> wondering why this is not done in a more systematic way.
This is an explanation, not a justification:
If you look at the history of that file, you'll see a hint:  line 1168 
was written in 1998, the other lines were written later, by other 
people.  It is simply a matter of someone thinking something was safe 
when it wasn't, and it's not clear who was wrong:  it may have been safe 
when written, but susceptible to later changes.
> Even when nothing between PROTECT(allocVector()) and the
> corresponding UNPROTECT could trigger garbage collection
> (e.g. PROTECT(allocVector()) is close to the return statement).
> Because making exceptions like this can make your code
> really hard to maintain in the long term.
There are a lot of people who object to anything that slows R at all. 
That puts pressure on anyone writing code to do it in a way that wastes 
as few cycles as possible.  That in turn makes it harder for someone 
else to analyze the code.  And overuse of PROTECT also makes the code 
harder to read.
As an example: just below line 1168 there's another unprotected 
allocVector of nm, but I think that one is safe, because it is attached 
as an attribute to ans (which is now PROTECT'd) before anything is done 
that could trigger gc.  And a few lines below that, on another branch of 
the if, another unprotected but safe-looking allocation.  Should I 
protect those?  Then I'd also need to call UNPROTECT again, or keep a 
counter of PROTECT calls, and the code would be a little harder to read.
Thanks for tracking down these two bugs; I'll fix the grep bug too.  If 
you feel like looking for more, it would be appreciated.  (Writing an 
automatic tool to analyze code and determine where new ones are needed 
and where existing ones could be eliminated might be a fun project, but 
there are too many fun projects.)
Duncan Murdoch
> 
> Cheers,
> H.
> 
> 
> Hervé Pagès wrote:
>> murdoch at stats.uwo.ca wrote:
>>> On 15/07/2009 8:30 PM, murdoch at stats.uwo.ca wrote:
>>>> On 15/07/2009 8:08 PM, Hervé Pagès wrote:
>>>>> Hi,
>>>>>
>>>>>    > x <- c(a=10, b=20)
>>>>>    > length(x) <- 1
>>>>>    > x
>>>>>     a
>>>>>    10
>>>>>
>>>>> But with gctorture turned on, I get:
>>>>>
>>>>>    > gctorture(TRUE)
>>>>>    > x <- c(a=10, b=20)
>>>>>    > length(x) <- 1
>>>>>    > x
>>>>>      a
>>>>>    "a"   <---- ???
>>>>>
>>>>>    > x <- c(a=10, b=20)
>>>>>    > length(x) <- 3
>>>>>
>>>>>     *** caught segfault ***
>>>>>    address (nil), cause 'unknown'
>>>>>
>>>>>    Possible actions:
>>>>>    1: abort (with core dump, if enabled)
>>>>>    2: normal R exit
>>>>>    3: exit R without saving workspace
>>>>>    4: exit R saving workspace
>>>>>
>>>>> This seems to have been around for a while (I get this with R 2.10,
>>>>> 2.9 and 2.8). Note that I don't get this with an unnamed vector.
>>>>>
>>>>> This problem affects the methods package. I found it while
>>>>> troubleshooting the "Protection stack overflow" I reported earlier
>>>>> (see https://stat.ethz.ch/pipermail/r-devel/2009-July/054030.html)
>>>>> but I can't tell yet whether the 2 issues are related or not.
>>>> That's clearly a bug (reproducible in today's R-devel build); I've 
>>>> cc'd this reply to r-bugs.  I'll take a look and see if I can track 
>>>> it down.
>>> That's got to be the easiest low-level bug I've worked on in a while. 
>>> Just a missing PROTECT.  Now fixed, about to be committed to R-devel.
>> Thanks Duncan! And the "Protection stack overflow" issue that was affecting
>> the methods package is gone now :)
>>
>> Cheers,
>> H.
>>
>>> Duncan Murdoch
>>>
>>>>> It would be nice to see some reaction from the R developers
>>>>> about these issues. Thanks in advance!
>>>> You should post them as bug reports if they are as clearly bugs as 
>>>> this one; otherwise they can easily get lost in the noise.  I'm not 
>>>> going to offer to look into the other one; I don't know the insides 
>>>> of the methods package.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>> H.
>>>>>
>>>>>
>>>>> hpages at fhcrc.org wrote:
>>>>>> Hi,
>>>>>>
>>>>>>   > gctorture(TRUE)
>>>>>>   > setGeneric("foo", function(x, y) standardGeneric("foo"))
>>>>>>   [1] "foo"
>>>>>>   > setMethod("foo", c("ANY", "ANY"),
>>>>>>   +   function(x, y) cat("calling foo,ANY,ANY method\n")
>>>>>>   + )
>>>>>>   Error: protect(): protection stack overflow
>>>>>>
>>>>>> Sorry this is something I already reported one week ago here
>>>>>>   https://stat.ethz.ch/pipermail/r-devel/2009-July/053973.html
>>>>>> but I just had a 2nd look at it and realized that the problem
>>>>>> can in fact be reproduced out of the .onLoad() hook. So I'm
>>>>>> reporting it again with a different subject.
>>>>>>
>>>>>> See my sessionInfo() below. Thanks!
>>>>>> H.
>>>>>>
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.10.0 Under development (unstable) (2009-06-26 r48837)
>>>>>> x86_64-unknown-linux-gnu
>>>>>>
>>>>>> locale:
>>>>>>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>>>>>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
>>>>>>  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
>>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
    
    
More information about the R-devel
mailing list