[Rd] Infrequent but steady NULL-pointer caused segfault in as.POSIXlt.POSIXct (R 3.4.4)
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Mon Aug 5 10:16:37 CEST 2019
On 8/4/19 3:30 PM, Sun Yijiang wrote:
> A reply from stackoverflow suggests I might have hit this bug:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=14023
>
> I can confirm that this glibc bug affects my system (latest CentOS 7).
> However, as far as I know, R is not multithreaded in its core. Is it
> possible that some library triggered this?
Yes, R is single threaded in its core, but that does not rule out
completely that this glibc bug has been triggered via say a bug in some
package that uses threads, either directly or indirectly via an external
library.
Best
Tomas
>
> Regards,
> Steve
>
> Tomas Kalibera <tomas.kalibera using gmail.com> 于2019年8月2日周五 下午4:59写道:
>> In an optimized build, debug info is just an approximation. It might
>> help to debug in a build of R and packages without compiler
>> optimizations (-O0), where the debug information is accurate. However,
>> first I would try to modify the example to trigger more often, or try to
>> find external ways to make it trigger more often (e.g. via gctorture).
>> Then I would try to make the example smaller (not call gc() explicitly,
>> not call any external code - e.g. the jobs, etc) - any time the example
>> is reduced but still triggers the errors, the reasoning is made easier.
>> Once you have a repeatable situation in a build with reliable debug
>> symbols, debugging is easier too, e.g. sometimes a watchpoint helps to
>> find memory corruption. Please feel free to ask more when you have more
>> information/updates. If this ends up being a bug in R, please report
>> (and with a reproducible example, if it is not obvious from the source
>> code).
>>
>> Best
>> Tomas
>>
>>
>> On 8/2/19 10:23 AM, Sun Yijiang wrote:
>>> The R script I run daily for hours looks like this:
>>>
>>> while (!finish) {
>>> Sys.sleep(0.1)
>>> time = as.integer(format(Sys.time(), "%H%M")) # always crash here
>>> if (new.data.timestamp() <= time)
>>> next
>>> # ... do some jobs for about 2 minutes ...
>>> gc()
>>> }
>>>
>>> Basically it waits for new data, which comes in every 10 minutes, and
>>> do some jobs, then gc(), then loop again. It works great most of the
>>> time, but crashes strangely once a month or so. Although infrequent,
>>> it always crashes at the same place and gives the same error info,
>>> like this:
>>>
>>> *** caught segfault ***
>>> address (nil), cause 'memory not mapped'
>>>
>>> Traceback:
>>> 1: as.POSIXlt.POSIXct(x, tz)
>>> 2: as.POSIXlt(x, tz)
>>> 3: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...)
>>> 4: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...),
>>> names = names(x))
>>> 5: format.POSIXct(Sys.time(), format = "%H%M")
>>> 6: format(Sys.time(), format = "%H%M")
>>> 7: format(Sys.time(), format = "%H%M")
>>> … …
>>>
>>> I looked into the dumped core with gdb, and found something very strange:
>>>
>>> gdb /usr/lib64/R/bin/exec/R ~/core.30387
>>> (gdb) bt 5
>>> #0 0x00007f1dca844ff1 in __strlen_sse2_pminub () from /lib64/libc.so.6
>>> #1 0x00007f1dcb20e8f9 in Rf_mkChar (name=0x0) at envir.c:3725
>>> #2 0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
>>> op=<optimized out>, args=<optimized out>,
>>> env=<optimized out>) at datetime.c:705
>>> #3 0x00007f1dcb22197f in bcEval (body=body using entry=0x4064b28,
>>> rho=rho using entry=0xc449d38, useCache=useCache using entry=TRUE)
>>> at eval.c:6473
>>> #4 0x00007f1dcb230370 in Rf_eval (e=0x4064b28,
>>> rho=rho using entry=0xc449d38) at eval.c:624
>>> (More stack frames follow…)
>>>
>>> Tracing into src/main/datetime.c:705, it’s a simple string-making code:
>>> SET_STRING_ELT(tzone, 1, mkChar(R_tzname[0]));
>>>
>>> mkChar function is defined in envir.c:3725:
>>> 3723 SEXP mkChar(const char *name)
>>> 3724 {
>>> 3725 size_t len = strlen(name);
>>> … …
>>>
>>> gdb shows that the string pointer (name=0x0) mkChar received is NULL,
>>> and subsequently strlen(NULL) caused the segfault. But quite
>>> contradictorily, gdb shows the value passed to mkChar in the caller is
>>> valid:
>>>
>>> (gdb) frame 2
>>> #2 0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
>>> op=<optimized out>, args=<optimized out>,
>>> env=<optimized out>) at datetime.c:705
>>> 705 datetime.c: No such file or directory.
>>> (gdb) p tzname[0]
>>> $1 = 0x4cf39c0 “CST”
>>>
>>> R_tzname is an alias of tzname. (#define R_tzname tzname in the same file.)
>>>
>>> At first, I suspect that some library may have messed up the memory
>>> and accidentally zeroed tzname (a global variable). But with this gdb
>>> trace, it shows that tzname is good, only that the pointer passed to
>>> mkChar magically changed to zero. Like this:
>>>
>>> mkChar(tzname[0]) // tzname[0] is “CST”, address 0x4cf39c
>>> … …
>>> SEXP mkChar(const char *name) // name should be 0x4cf39c, but gdb shows 0x0
>>> {
>>> size_t len = strlen(name); // segfault, as name is NULL
>>> … …
>>>
>>> The only theory I can think of so far is that, on calling mkChar, the
>>> parameter passed on stack somehow got wiped out to zero by some buggy
>>> code in R or library. At a higher level, what I see is this: If you
>>> run format(Sys.time(), "%H%M”) a million times a day (together with
>>> other codes of course), once in a month or so this simple line can
>>> segfault.
>>>
>>> I’m lost in this confusion, could someone please help me find the
>>> right direction to further look into this problem?
>>>
>>> Regards,
>>> Steve
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
More information about the R-devel
mailing list