[Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

Avraham Adler @vr@h@m@@d|er @end|ng |rom gm@||@com
Wed Nov 17 08:15:56 CET 2021


On Tue, Nov 16, 2021 at 10:42 PM Kevin Ushey <kevinushey using gmail.com> wrote:
>
> Do you see this same hang in a build of R with debug symbols? Can you
> try running R with GDB, or even WinDbg or another debugger, to see
> what the call stack looks like when the hang occurs? Does the hang
> depend on the number of threads used by OpenBLAS?
>
> On the off chance it's relevant, I've seen hangs / crashes when using
> a multithreaded OpenBLAS with R on some Linux systems before, but
> never found the time to isolate a root cause.
>


This last was a good thought, Kevin, as I had just compiled OpenBLAS
3.18 multi-threaded, but I recompiled it single threaded and it still
crashes. The version of R I built from source last time, (2021-05-20
r80347), does not hang when calling `dchisq(c(Inf, 1e80, 1e50, 1e40),
df=10, ncp=1)`. I think I built that with OpenBLAS 3.15. I can try
doing that here. As for building with debug symbols, I have never done
that before, so if you could provide some guidance (off-list if you
think it is inappropriate to keep it here) or point me in the
direction of some already posted advice, I would appreciate it!

Avi



> Best,
> Kevin
>
> On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler <avraham.adler using gmail.com> wrote:
> >
> > On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler
> > <maechler using stat.math.ethz.ch> wrote:
> > >
> > > >>>>> Avraham Adler
> > > >>>>>     on Tue, 16 Nov 2021 02:35:56 +0000 writes:
> > >
> > >     > I am building r-devel on Windows 10 64bit using Jeroen's mingw system,
> > >     > and I am finding that my make check-devel hangs on the above issue.
> > >     > Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> > >     > been using OpenBLAS for over a decade and have not had this issue
> > >     > before. Is there anything I can do to dig deeper into this issue from
> > >     > my end? Could there be anything that changed in R-devel that may have
> > >     > triggered this? The bugzilla report doesn't have any code attached to
> > >     > it.
> > >
> > >     > Thank you,
> > >     > Avi
> > >
> > > Hmm.. it would've be nice to tell a bit more, instead of having all
> > > your readers to search links, etc.
> > >
> > > In the bugzilla bug report PR#13309
> > > https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was
> > >
> > >  dchisq(x=Inf, df=10, ncp=1)
> > >
> > > I had fixed the bug 13 years ago, in svn rev 47005
> > > with regression test in <Rsrc>/tests/d-p-q-r-tests.R :
> > >
> > >
> > > ## Non-central Chi^2 density for large x
> > > stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
> > > ## did hang in 2.8.0 and earlier (PR#13309).
> > >
> > >
> > > and you are seeing your version of R hanging at exactly this
> > > location?
> > >
> > >
> > > I'd bet quite a bit that the underlying code in these
> > > non-central chi square computations *never* calls BLAS and hence
> > > I cannot imagine how openBLAS could play a role.
> > >
> > > However, there must be something peculiar in your compiler setup,
> > > compilation options, ....
> > > as of course the above regression test has been run 100s of
> > > 1000s of times also under Windows in the last 13 years ..
> > >
> > > Last but not least (but really only vaguely related):
> > >    There is still a FIXME in the source code (but not about
> > > hanging, but rather of loosing some accuracy in border cases),
> > > see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
> > > and for that reason I had written an R version of that C code
> > > even back in 2008 which I've made available in  CRAN package
> > > DPQ  a few years ago (together with many other D/P/Q
> > > distribution computations/approximations).
> > >              -> https://cran.r-project.org/package=DPQ
> > >
> > > Best,
> > > Martin
> > >
> >
> > Hello, Martin.
> >
> > Apologies, I thought the PR # was sufficient. Yes, I am seeing this at
> > this exact location. This is what I saw in d-p-q-r-tst-2.Rout.fail and
> > I then ran d-p-q-r-tst.R line-by-line and R hung precisely after
> > `stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))`.
> >
> > Is it at all possible that this has to do with the recent change from
> > bd0 to ebd0 (PR #15628) [1]?
> >
> > For completeness, I ran all the code _beneath_ the call, and while
> > nothing else cause an infinite loop, I posted what I believe may be
> > unexpected results below,
> >
> > Thank you,
> >
> > Avi
> >
> > [1]: https://bugs.r-project.org/show_bug.cgi?id=15628
> >
> > > ## FIXME ?!:  MxM/2 seems +- ok ??
> > > (dLmM <- dnbinom(xL, mu = 1, size = MxM))  # all NaN but the last
> > Warning in dnbinom(xL, mu = 1, size = MxM) : NaNs produced
> >  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > > (dLpI <- dnbinom(xL, prob=1/2, size = Inf))#  ditto
> > Warning in dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced
> >  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > > (dLpM <- dnbinom(xL, prob=1/2, size = MxM))#  ditto
> > Warning in dnbinom(xL, prob = 1/2, size = MxM) : NaNs produced
> >  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> >
> > > d <- dnbinom(x,  mu = mu, size = Inf) # gave NaN (for 0 and L), now all 0
> > Warning in dnbinom(x, mu = mu, size = Inf) : NaNs produced
> > > p <- pnbinom(x,  mu = mu, size = Inf) # gave all NaN, now uses ppois(x, mu)
> > Warning in pnbinom(x, mu = mu, size = Inf) : NaNs produced
> >
> > > pp <- (0:16)/16
> > > q <- qnbinom(pp, mu = mu, size = Inf) # gave all NaN
> > > set.seed(1); NI <- rnbinom(32, mu = mu, size = Inf)# gave all NaN
> > > set.seed(1); N2 <- rnbinom(32, mu = mu, size = L  )
> > > stopifnot(exprs = {
> > +     all.equal(d, c(0.006737947, 0.033689735, 0.0842243375,
> > 0.140373896, 0,0,0,0), tol = 9e-9)# 7.6e-10
> > +     all.equal(p, c(0.006737947, 0.040427682, 0.1246520195,
> > 0.265025915, 1,1,1,1), tol = 9e-9)# 7.3e-10
> > +     all.equal(d, dpois(x, mu))# current implementation: even identical()
> > +     all.equal(p, ppois(x, mu))
> > +     q == c(0, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 8, 9, Inf)
> > +     q == qpois(pp, mu)
> > +     identical(NI, N2)
> > + })
> > Error: d and c(0.006737947, 0.033689735, 0.0842243375, 0.140373896, 0,
> > 0,  .... are not equal:
> >   'is.NA' value mismatch: 0 in current 1 in target
> >
> > > if(!(onWindows && arch == "x86")) {
> > +  ## This gave a practically infinite loop (on 64-bit Lnx, Windows;
> > not in 32-bit)
> > +     tools::assertWarning(p <- pchisq(1.00000012e200, df=1e200, ncp=100),
> > +                          "simpleWarning", verbose=TRUE)
> > +     stopifnot(p == 1)
> > + }
> > Asserted warning: pnchisq(x=1e+200, f=1e+200, theta=100, ..): not
> > converged in 1000000 iter.
> >
> > [This may be OK, AA]
> >
> > > ## Show the (mostly) small differences :
> > > all.equal( qs, qpU, tol=0)
> > [1] "Mean relative difference: 1.572997e-16"
> > > all.equal(-qs, qp., tol=0)
> > [1] "Mean relative difference: 1.572997e-16"
> > > all.equal(-qp.,qpU, tol=0) # typically TRUE (<==> exact equality)
> > [1] "Mean relative difference: 4.710277e-16"
> > > stopifnot(exprs = {
> > +     all.equal( qs,  qpU, tol=1e-15)
> > +     all.equal(-qs,  qp., tol=1e-15)
> > +     all.equal(-qp., qpU, tol=1e-15)# diff of 4.71e-16 in 4.1.0 w/icc
> > (Eric Weese)
> > + })
> > > ## both failed very badly in  R <= 4.0.x
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list