[Rd] Infrequent but steady NULL-pointer caused segfault in as.POSIXlt.POSIXct (R 3.4.4)

Sun Yijiang @uny|j|@ng @end|ng |rom gm@||@com
Sun Aug 4 15:30:58 CEST 2019


A reply from stackoverflow suggests I might have hit this bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=14023

I can confirm that this glibc bug affects my system (latest CentOS 7).
However, as far as I know, R is not multithreaded in its core.  Is it
possible that some library triggered this?

Regards,
Steve

Tomas Kalibera <tomas.kalibera using gmail.com> 于2019年8月2日周五 下午4:59写道:
>
> In an optimized build, debug info is just an approximation. It might
> help to debug in a build of R and packages without compiler
> optimizations (-O0), where the debug information is accurate. However,
> first I would try to modify the example to trigger more often, or try to
> find external ways to make it trigger more often (e.g. via gctorture).
> Then I would try to make the example smaller (not call gc() explicitly,
> not call any external code - e.g. the jobs, etc) - any time the example
> is reduced but still triggers the errors, the reasoning is made easier.
> Once you have a repeatable situation in a build with reliable debug
> symbols, debugging is easier too, e.g. sometimes a watchpoint helps to
> find memory corruption. Please feel free to ask more when you have more
> information/updates. If this ends up being a bug in R, please report
> (and with a reproducible example, if it is not obvious from the source
> code).
>
> Best
> Tomas
>
>
> On 8/2/19 10:23 AM, Sun Yijiang wrote:
> > The R script I run daily for hours looks like this:
> >
> > while (!finish) {
> >      Sys.sleep(0.1)
> >      time = as.integer(format(Sys.time(), "%H%M")) # always crash here
> >      if (new.data.timestamp() <= time)
> >          next
> >      # ... do some jobs for about 2 minutes ...
> >      gc()
> > }
> >
> > Basically it waits for new data, which comes in every 10 minutes, and
> > do some jobs, then gc(), then loop again.  It works great most of the
> > time, but crashes strangely once a month or so.  Although infrequent,
> > it always crashes at the same place and gives the same error info,
> > like this:
> >
> >   *** caught segfault ***
> > address (nil), cause 'memory not mapped'
> >
> > Traceback:
> >   1: as.POSIXlt.POSIXct(x, tz)
> >   2: as.POSIXlt(x, tz)
> >   3: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...)
> >   4: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...),
> >    names = names(x))
> >   5: format.POSIXct(Sys.time(), format = "%H%M")
> >   6: format(Sys.time(), format = "%H%M")
> >   7: format(Sys.time(), format = "%H%M")
> > … …
> >
> > I looked into the dumped core with gdb, and found something very strange:
> >
> > gdb /usr/lib64/R/bin/exec/R ~/core.30387
> > (gdb) bt 5
> > #0  0x00007f1dca844ff1 in __strlen_sse2_pminub () from /lib64/libc.so.6
> > #1  0x00007f1dcb20e8f9 in Rf_mkChar (name=0x0) at envir.c:3725
> > #2  0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
> > op=<optimized out>, args=<optimized out>,
> >      env=<optimized out>) at datetime.c:705
> > #3  0x00007f1dcb22197f in bcEval (body=body using entry=0x4064b28,
> > rho=rho using entry=0xc449d38, useCache=useCache using entry=TRUE)
> >      at eval.c:6473
> > #4  0x00007f1dcb230370 in Rf_eval (e=0x4064b28,
> > rho=rho using entry=0xc449d38) at eval.c:624
> > (More stack frames follow…)
> >
> > Tracing into src/main/datetime.c:705, it’s a simple string-making code:
> > SET_STRING_ELT(tzone, 1, mkChar(R_tzname[0]));
> >
> > mkChar function is defined in envir.c:3725:
> > 3723  SEXP mkChar(const char *name)
> > 3724  {
> > 3725      size_t len =  strlen(name);
> > … …
> >
> > gdb shows that the string pointer (name=0x0) mkChar received is NULL,
> > and subsequently strlen(NULL) caused the segfault.  But quite
> > contradictorily, gdb shows the value passed to mkChar in the caller is
> > valid:
> >
> > (gdb) frame 2
> > #2  0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
> > op=<optimized out>, args=<optimized out>,
> >      env=<optimized out>) at datetime.c:705
> > 705 datetime.c: No such file or directory.
> > (gdb) p tzname[0]
> > $1 = 0x4cf39c0 “CST”
> >
> > R_tzname is an alias of tzname. (#define R_tzname tzname in the same file.)
> >
> > At first, I suspect that some library may have messed up the memory
> > and accidentally zeroed tzname (a global variable).  But with this gdb
> > trace, it shows that tzname is good, only that the pointer passed to
> > mkChar magically changed to zero.  Like this:
> >
> > mkChar(tzname[0])  // tzname[0] is “CST”, address 0x4cf39c
> > … …
> > SEXP mkChar(const char *name)  // name should be 0x4cf39c, but gdb shows 0x0
> > {
> >      size_t len =  strlen(name);  // segfault, as name is NULL
> > … …
> >
> > The only theory I can think of so far is that, on calling mkChar, the
> > parameter passed on stack somehow got wiped out to zero by some buggy
> > code in R or library.  At a higher level, what I see is this:  If you
> > run format(Sys.time(), "%H%M”) a million times a day (together with
> > other codes of course), once in a month or so this simple line can
> > segfault.
> >
> > I’m lost in this confusion, could someone please help me find the
> > right direction to further look into this problem?
> >
> > Regards,
> > Steve
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list