[Rd] load/unload segfault puzzle

Ben Bolker bbolker at gmail.com
Thu Jun 13 03:53:55 CEST 2013


On 13-06-12 07:09 PM, Duncan Murdoch wrote:
> On 13-06-12 1:47 PM, Ben Bolker wrote:
>> Ben Bolker <bbolker <at> gmail.com> writes:
>>
>>>
>>>
>>>    Dear r-devel readers,
>>>
>>>    I have a pretty deep problem with package loading and unloading in
>>> the development version of the lme4 package
>>> <https://github.com/lme4/lme4>; it's not boiled down to a properly
>>> minimal example yet (this has been difficult), but I am posting anyway
>>> in the hopes that someone has ideas about how to proceed farther,
>>> since I'm nearly stumped. Apologies in advance for the long post.
>>>
>>>     EXECUTIVE SUMMARY: after one cycle of loading, testing (e.g. by
>>> running example(lmer)) and unloading lme4, then loading and unloading
>>> the nlme package, re-loading and exercising lme4 becomes very
>>> unstable, leading eventually to a segmentation fault.  More detail is
>>> available at <https://github.com/lme4/lme4/issues/35> .  Because it's
>>> a segmentation fault, exactly _where_ the crash happens varies a bit
>>> according to platform and precise incantation, but it seems I can
>>> always get a segfault eventually.
>>>
>>
>>   [snip]
>>
>>    UPDATE: after some useful advice off-list, I tried with a
>> fully valgrind-instrumented version of R.  No suspicious memory
>> accesses occurred until the very end, right before the crash:
>>
>> Attempt #1
>> loading lme4
>> loaded DLLs: Rcpp RcppEigen minqa lme4
>> detaching lme4
>>
>>
>> ==18150== Jump to the invalid address stated on the next line
>> ==18150==    at 0x9E46D00: ???
>> ==18150==    by 0x410AB8A: RunFinalizers (memory.c:1357)
>> ==18150==    by 0x410D314: R_gc_internal (memory.c:2709)
>> ==18150==    by 0x410E73C: Rf_allocVector (memory.c:2421)
>> ==18150==    by 0x4153D30: ReadItem (serialize.c:1685)
>> ==18150==    by 0x4152D6D: ReadBC1 (serialize.c:1825)
>> ==18150==    by 0x4153983: ReadItem (serialize.c:1851)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==    by 0x41531A1: ReadItem (serialize.c:1599)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==    by 0x41531C0: ReadItem (serialize.c:1601)
>> ==18150==  Address 0x9e46d00 is not stack'd, malloc'd or (recently)
>> free'd
>> ==18150==
>>
>>    Does this suggest anything to anyone, or is it just the
>> final symptom of a mysterious problem that occurred earlier ... ?
> 
> One other suggestion: run under gctorture().  This will make it very
> slow, but might trigger the fault earlier.
> 
> Duncan Murdoch
> 

  Yes, thanks -- Bill Dunlap already suggested this.  Your and Bill's
warning about how slow gctorture makes things is correct -- I gave up
after running for 3.5 hours when it had gotten only partway through
loading the Matrix package; I will have to find a machine with a decent
cooling system (i.e. not my laptop) where I can replicate the error.
  I've just re-run the regular valgrind, with a fresh build right after
an SVN update.  I got exactly the same results as above.  We're
certainly *not* calling reg.finalizer() anywhere in our package, and I
don't think Rcpp or RcppEigen or minqa do ... there looks to be some
kind of default finalization done on the reference class objects (based
on running 'strings' on the object files ...
  I tried gdb'ing in and setting a breakpoint at memory.c:1357, but this
breakpoint gets hit a lot, and I'm sort of stabbing in the dark at this
point.

   Ben Bolker



More information about the R-devel mailing list