[Rd] R 4.0.2 64-bit Windows hangs
Jeroen Ooms
jeroen @end|ng |rom berke|ey@edu
Thu Aug 27 20:38:42 CEST 2020
On Wed, Aug 26, 2020 at 7:54 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>
> On 8/25/20 6:14 PM, Tomas Kalibera wrote:
> > On 8/22/20 9:33 PM, Jeroen Ooms wrote:
> >> On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera
> >> <tomas.kalibera using gmail.com> wrote:
> >>> On 8/22/20 8:26 PM, Tomas Kalibera wrote:
> >>>> On 8/22/20 7:58 PM, Jeroen Ooms wrote:
> >>>>> On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera
> >>>>> <tomas.kalibera using gmail.com> wrote:
> >>>>>> On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote:
> >>>>>>> Ah yes, this is related. I reported v2010 below, but it looks like
> >>>>>>> I was updated to this Insider Build overnight without my knowledge,
> >>>>>>> and conflated it with the new installation R v4 this morning.
> >>>>>>>
> >>>>>>> I will continue to look into the issue with the methods Tomas
> >>>>>>> mentioned.
> >>>>>> It is interesting that a rare 5 years old problem would re-appear on
> >>>>>> current Insider builds. Which build of Windows are you running
> >>>>>> exactly?
> >>>>>> I've seen another report about a crash on 20190.1000. It'd be
> >>>>>> nice to
> >>>>>> know if it is present also in newer builds, i.e. in 20197.
> >>>>> I installed the latest 20197 build in a vm, and I can indeed
> >>>>> reproduce
> >>>>> this problem.
> >>>>>
> >>>>> What seems to be happening is that R triggers an infinite
> >>>>> recursion in
> >>>>> Windows unwinding mechanism, and eventually dies with a stack
> >>>>> overflow. Attached a backtrace of the initial 100 frames of the main
> >>>>> thread (the pattern in the top ~30 frames continues forever).
> >>>>>
> >>>>> The microsoft blog doesn't mention anything related to exception
> >>>>> handling has changed in recent versions:
> >>>>> https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch
> >>>>>
> >>>>>
> >>>> Thanks, unfortunately that does not ring any bells (except below), I
> >>>> can't guess from this what is the underlying cause of the problem.
> >>>> There may be something wrong in how we use setjmp/longjmp or how
> >>>> setjmp/longjmp works on Windows.
> >>>>
> >>>> It reminds me of a problem I've been debugging few days ago, when
> >>>> longjump implementation segfaults on Windows 10 (recent but not
> >>>> Insider build) probably soon after unwinding the stack, but only with
> >>>> GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only
> >>>> with -03 (not -O2, not with with -O3 -fno-split-loops). The problem
> >>>> was sensitive to these optimization options interestingly on the call
> >>>> site of long jump (do_abs), even when it was not an immediate caller
> >>>> of the longjump. I've not tracked this down yet, it will require
> >>>> looking at the assembly level, and I was suspecting a compiler error
> >>>> causing the compiler to generate code that messes with the stack or
> >>>> registers in a way that impacts the upcoming jump. But now as we have
> >>>> this other problem with setjmp/logjmp, the compiler may not be the top
> >>>> suspect anymore.
> >>>>
> >>>> I may not be able to work on this in the next few days or a week, so
> >>>> if anyone gets there first, please let me know what you find out.
> >>> Btw could you please try out if the UCRT build of R crashes as well in
> >>> the Insider Windows build ?
> >> Yes, it hangs in exactly the same way, except that the backtrace shows
> >>
> >> ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll
> >>
> >> Instead of msvcrt!_setjmpex (as expected of course).
> >
> > Thanks. I found what is causing the problem I observed with
> > GCC10/stock Windows 10, I expect this is the same one as in the
> > Insider build.
> > I will investigate further,
> >
> > Tomas
> >
> It seems the problem is between MinGW-W64 and Windows, and really it
> causes both the reported crashes in an Insider build (I tested in 20197)
> and in my GCC 10 builds in a single "no-segfault" test. setjmp is
> implemented using Windows call _setjmpex, which has a second argument
> argument, which is set differently by MinGW based on GCC version. When I
> set this argument as MinGW-W64 did on early versions of GCC,
> mingw_getsp(), it fixes/hides the problems on my systems. Perl5 uses a
> similar workaround, but otherwise there is no solid base (documentation,
> specification, etc) I am aware of for this change, so this may take some
> more time to be properly fixed. Still, if anyone experiments with this
> workaround and finds a problem, please let me know. In particular, I am
> curious whether it works on earlier versions of Windows (at least with
> check-all, including recommended packages).
FYI, the problem has disappeared on Windows dev built 20201 (released
yesterday), so it may have been a Windows bug. That is not to say
there is no bug on the R/mingw side, but at least the current and past
releases of R are working again on the latest versions of Windows,
which is a big relief.
More information about the R-devel
mailing list