[R] segfault debugging

William Dunlap wdunlap at tibco.com
Sat Dec 1 19:31:09 CET 2012


While it is true that valgrind is most effective when R is built with special
flags, it can be useful without the special build.  Also, gctorture(TRUE)
does make it run very slowly, but I have seen it show things that only
sporadically show up otherwise.  E.g., with an optimized 64-bit Linux
build of R-2.15.2 we get:

% R --debugger=valgrind --quiet
==19559== Memcheck, a memory error detector
==19559== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==19559== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==19559== Command: /opt/sw/R/R-2.15.2.atlas1/lib/R/bin/exec/R --quiet
...
> gctorture(TRUE)
> library(gam)
Loading required package: splines
Loaded gam 1.06.2

> g <- gam(mpg~lo(hp,span=.4,degree=2)+gear, data=mtcars)
==19559== Conditional jump or move depends on uninitialised value(s)
==19559==    at 0x4F4CA5D: R_gc_internal (memory.c:1510)
==19559==    by 0x4F4EDDD: Rf_allocVector (memory.c:2355)
==19559==    by 0x4FC94D0: do_lazyLoadDBfetch (serialize.c:2517)
==19559==    by 0x4F12E8E: Rf_eval (eval.c:497)
...
==19559== Use of uninitialised value of size 8
==19559==    at 0x4F4CA5F: R_gc_internal (memory.c:1510)
==19559==    by 0x4F4EDDD: Rf_allocVector (memory.c:2355)
==19559==    by 0x4FC94D0: do_lazyLoadDBfetch (serialize.c:2517)
==19559==    by 0x4F12E8E: Rf_eval (eval.c:497)
...
==19559== Invalid read of size 1
==19559==    at 0x4F4CA5F: R_gc_internal (memory.c:1510)
==19559==    by 0x4F4EDDD: Rf_allocVector (memory.c:2355)
==19559==    by 0x4FC94D0: do_lazyLoadDBfetch (serialize.c:2517)
==19559==    by 0x4F12E8E: Rf_eval (eval.c:497)
...
*** caught segfault ***
address 0x30000010, cause 'memory not mapped'
==19559== Conditional jump or move depends on uninitialised value(s)
==19559==    at 0x4F4CA5D: R_gc_internal (memory.c:1510)
==19559==    by 0x4F4EB19: Rf_cons (memory.c:2076)
==19559==    by 0x4F4ECEE: Rf_allocList (memory.c:2481)
==19559==    by 0x4EFD094: R_GetTraceback (errors.c:1307)
...

Without the gctorture(TRUE) there is no complaint from valgrind,
although I've seen evidence of memory corruption later.  (Without
valgrind but with gctorture(TRUE) there is no complaint on the first
call to gam() but R dies on a repeated call.)

Perhaps if I used a valgrind-enabled build of R it would localize
the problem to C or Fortran code behind lo(degree=2).


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent: Saturday, December 01, 2012 8:45 AM
> To: William Dunlap
> Cc: Martin Morgan; Duncan Murdoch; r-help at r-project.org; Donatella Quagli
> Subject: Re: [R] segfault debugging
> 
> 
> 
> On 1 Dec 2012, at 16:09, William Dunlap <wdunlap at tibco.com> wrote:
> 
> >> valgrind is usually effective for this
> >>
> >>  R -d valgrind -f myscript.R
> >
> > And adding the R command
> >    gctorture(TRUE)
> > to the top of your script lets valgrind do a better job of
> > find memory misuse.
> 
> That makes things even slower: it really only helps when PROTECT is used incorrectly
> (including not used): this error looks more like a memory over-run.
> 
> Note that valgrind is really only effective for under/over-run errors involving memory
> allocated by R if the build of R is instrumented (see 'Writing R Extensions').
> 
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf
> >> Of Martin Morgan
> >> Sent: Saturday, December 01, 2012 6:54 AM
> >> To: Duncan Murdoch
> >> Cc: r-help at r-project.org; Donatella Quagli
> >> Subject: Re: [R] segfault debugging
> >>
> >> On 12/01/2012 04:51 AM, Duncan Murdoch wrote:
> >>> On 12-12-01 6:56 AM, Donatella Quagli wrote:
> >>>> Thank you so far. Here is an excerpt from the gdb session after a crash:
> >>>>   Program received signal SIGSEGV, Segmentation fault.
> >>>>
> >>>>   0xb7509a6b in Rf_allocVector () from /usr/lib/R/lib/libR.so
> >>>>   (gdb) backtrace
> >>>>   #0  0xb7509a6b in Rf_allocVector () from /usr/lib/R/lib/libR.so
> >>>>   #1  0xb744b64c in ?? () from /usr/lib/R/lib/libR.so
> >>>>   #2  0xb74c58bf in ?? () from /usr/lib/R/lib/libR.so
> >>>>   #3  0xb74c9c62 in Rf_eval () from /usr/lib/R/lib/libR.so
> >>>>   #4  0xb74ce60f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
> >>>>   #5  0xb74c9f29 in Rf_eval () from /usr/lib/R/lib/libR.so
> >>>>   #6  0xb7503002 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
> >>>>   #7  0xb7503298 in ?? () from /usr/lib/R/lib/libR.so
> >>>>   #8  0xb7503812 in run_Rmainloop () from /usr/lib/R/lib/libR.so
> >>>>   #9  0xb7503839 in Rf_mainloop () from /usr/lib/R/lib/libR.so
> >>>>   #10 0x08048768 in main ()
> >>>>   #11 0xb728de46 in __libc_start_main (main=0x8048730 <main>, argc=8,
> >>>> ubp_av=0xbfdb7824, init=0x80488a0 <__libc_csu_init>,
> >>>>       fini=0x8048890 <__libc_csu_fini>, rtld_fini=0xb7784590,
> >>>> stack_end=0xbfdb781c) at libc-start.c:228
> >>>>   #12 0x08048791 in _start ()
> >>>>
> >>>> It seems to me that the error is in frame #0. Does it mean that there is a bug
> >>>> in libR.so?  What can I do next?
> >>>
> >>> It means that the error was detected when trying to do a memory allocation.
> >>> That could be a bug in R, but more likely something else has damaged the memory
> >>> management system structures, e.g. a function writing to memory that it doesn't
> >>> own.
> >>>
> >>> Bugs like this are hard to track down, because the damage could have occurred a
> >>> long time before it showed up, and small changes to your script could affect it.
> >>>
> >>> I would try to narrow it down to a single statement in your script.  You might
> >>> be able to deduce that from the last line printed before the crash.  If you
> >>> don't have any printing, you could try adding some, but as I mentioned above,
> >>> that might make the bug behave differently.
> >>>
> >>> Another approach is to cut off statements at the end of your script. That
> >>> probably won't affect the bug until you cut off the statement that actually
> >>> triggered it (but it might, which is why this kind of bug is so frustrating to
> >>> track down).
> >>>
> >>> If you find the bad statement, then look at calls to external code in it, or
> >>> recently executed before it.  See if any of them look like they contain errors.
> >>> Common errors are to write to an array without allocating it, or to write beyond
> >>> the bounds of an array, or (in .Call() code) to allocate something and then fail
> >>> to protect it from garbage collection.
> >>>
> >>> You could also figure out what the problem is that caused the seg fault in frame
> >>> 0.  It might be because some particular variable contains a garbage value.  Then
> >>> in a new run, you can ask gdb to break when that memory location takes on the
> >>> garbage value.  This is usually effective if you really can identify the bad
> >>> value, but doing that can be hard, especially when you aren't familiar with how
> >>> things normally work.
> >>
> >> valgrind is usually effective for this
> >>
> >>   R -d valgrind -f myscript.R
> >>
> >> but it requires an operating system where it is available (e.g., linux) and a
> >> quick (say less than 10's of seconds) way of reproducing the bug (because
> >> valgrind slows evaluation alot). So the first step is really to narrow down your
> >> large script to something that is easier to re-run., e.g., saving the important
> >> R objects to a file shortly before the problem section of your script, then
> >> reproducing the problem by loading those and evaluating a few steps of the code.
> >> The bug can still be intermittent; valgrind will likely spot the problem.
> >>
> >> Martin
> >>
> >>>
> >>> Good luck!
> >>>
> >>> Duncan Murdoch
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >> --
> >> Computational Biology / Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N.
> >> PO Box 19024 Seattle, WA 98109
> >>
> >> Location: Arnold Building M1 B861
> >> Phone: (206) 667-2793
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list