[Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

Kevin Ushey kev|nu@hey @end|ng |rom gm@||@com
Tue Nov 16 23:41:47 CET 2021


Do you see this same hang in a build of R with debug symbols? Can you
try running R with GDB, or even WinDbg or another debugger, to see
what the call stack looks like when the hang occurs? Does the hang
depend on the number of threads used by OpenBLAS?

On the off chance it's relevant, I've seen hangs / crashes when using
a multithreaded OpenBLAS with R on some Linux systems before, but
never found the time to isolate a root cause.

Best,
Kevin

On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler <avraham.adler using gmail.com> wrote:
>
> On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler
> <maechler using stat.math.ethz.ch> wrote:
> >
> > >>>>> Avraham Adler
> > >>>>>     on Tue, 16 Nov 2021 02:35:56 +0000 writes:
> >
> >     > I am building r-devel on Windows 10 64bit using Jeroen's mingw system,
> >     > and I am finding that my make check-devel hangs on the above issue.
> >     > Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> >     > been using OpenBLAS for over a decade and have not had this issue
> >     > before. Is there anything I can do to dig deeper into this issue from
> >     > my end? Could there be anything that changed in R-devel that may have
> >     > triggered this? The bugzilla report doesn't have any code attached to
> >     > it.
> >
> >     > Thank you,
> >     > Avi
> >
> > Hmm.. it would've be nice to tell a bit more, instead of having all
> > your readers to search links, etc.
> >
> > In the bugzilla bug report PR#13309
> > https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was
> >
> >  dchisq(x=Inf, df=10, ncp=1)
> >
> > I had fixed the bug 13 years ago, in svn rev 47005
> > with regression test in <Rsrc>/tests/d-p-q-r-tests.R :
> >
> >
> > ## Non-central Chi^2 density for large x
> > stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
> > ## did hang in 2.8.0 and earlier (PR#13309).
> >
> >
> > and you are seeing your version of R hanging at exactly this
> > location?
> >
> >
> > I'd bet quite a bit that the underlying code in these
> > non-central chi square computations *never* calls BLAS and hence
> > I cannot imagine how openBLAS could play a role.
> >
> > However, there must be something peculiar in your compiler setup,
> > compilation options, ....
> > as of course the above regression test has been run 100s of
> > 1000s of times also under Windows in the last 13 years ..
> >
> > Last but not least (but really only vaguely related):
> >    There is still a FIXME in the source code (but not about
> > hanging, but rather of loosing some accuracy in border cases),
> > see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
> > and for that reason I had written an R version of that C code
> > even back in 2008 which I've made available in  CRAN package
> > DPQ  a few years ago (together with many other D/P/Q
> > distribution computations/approximations).
> >              -> https://cran.r-project.org/package=DPQ
> >
> > Best,
> > Martin
> >
>
> Hello, Martin.
>
> Apologies, I thought the PR # was sufficient. Yes, I am seeing this at
> this exact location. This is what I saw in d-p-q-r-tst-2.Rout.fail and
> I then ran d-p-q-r-tst.R line-by-line and R hung precisely after
> `stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))`.
>
> Is it at all possible that this has to do with the recent change from
> bd0 to ebd0 (PR #15628) [1]?
>
> For completeness, I ran all the code _beneath_ the call, and while
> nothing else cause an infinite loop, I posted what I believe may be
> unexpected results below,
>
> Thank you,
>
> Avi
>
> [1]: https://bugs.r-project.org/show_bug.cgi?id=15628
>
> > ## FIXME ?!:  MxM/2 seems +- ok ??
> > (dLmM <- dnbinom(xL, mu = 1, size = MxM))  # all NaN but the last
> Warning in dnbinom(xL, mu = 1, size = MxM) : NaNs produced
>  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > (dLpI <- dnbinom(xL, prob=1/2, size = Inf))#  ditto
> Warning in dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced
>  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > (dLpM <- dnbinom(xL, prob=1/2, size = MxM))#  ditto
> Warning in dnbinom(xL, prob = 1/2, size = MxM) : NaNs produced
>  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
>
> > d <- dnbinom(x,  mu = mu, size = Inf) # gave NaN (for 0 and L), now all 0
> Warning in dnbinom(x, mu = mu, size = Inf) : NaNs produced
> > p <- pnbinom(x,  mu = mu, size = Inf) # gave all NaN, now uses ppois(x, mu)
> Warning in pnbinom(x, mu = mu, size = Inf) : NaNs produced
>
> > pp <- (0:16)/16
> > q <- qnbinom(pp, mu = mu, size = Inf) # gave all NaN
> > set.seed(1); NI <- rnbinom(32, mu = mu, size = Inf)# gave all NaN
> > set.seed(1); N2 <- rnbinom(32, mu = mu, size = L  )
> > stopifnot(exprs = {
> +     all.equal(d, c(0.006737947, 0.033689735, 0.0842243375,
> 0.140373896, 0,0,0,0), tol = 9e-9)# 7.6e-10
> +     all.equal(p, c(0.006737947, 0.040427682, 0.1246520195,
> 0.265025915, 1,1,1,1), tol = 9e-9)# 7.3e-10
> +     all.equal(d, dpois(x, mu))# current implementation: even identical()
> +     all.equal(p, ppois(x, mu))
> +     q == c(0, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 8, 9, Inf)
> +     q == qpois(pp, mu)
> +     identical(NI, N2)
> + })
> Error: d and c(0.006737947, 0.033689735, 0.0842243375, 0.140373896, 0,
> 0,  .... are not equal:
>   'is.NA' value mismatch: 0 in current 1 in target
>
> > if(!(onWindows && arch == "x86")) {
> +  ## This gave a practically infinite loop (on 64-bit Lnx, Windows;
> not in 32-bit)
> +     tools::assertWarning(p <- pchisq(1.00000012e200, df=1e200, ncp=100),
> +                          "simpleWarning", verbose=TRUE)
> +     stopifnot(p == 1)
> + }
> Asserted warning: pnchisq(x=1e+200, f=1e+200, theta=100, ..): not
> converged in 1000000 iter.
>
> [This may be OK, AA]
>
> > ## Show the (mostly) small differences :
> > all.equal( qs, qpU, tol=0)
> [1] "Mean relative difference: 1.572997e-16"
> > all.equal(-qs, qp., tol=0)
> [1] "Mean relative difference: 1.572997e-16"
> > all.equal(-qp.,qpU, tol=0) # typically TRUE (<==> exact equality)
> [1] "Mean relative difference: 4.710277e-16"
> > stopifnot(exprs = {
> +     all.equal( qs,  qpU, tol=1e-15)
> +     all.equal(-qs,  qp., tol=1e-15)
> +     all.equal(-qp., qpU, tol=1e-15)# diff of 4.71e-16 in 4.1.0 w/icc
> (Eric Weese)
> + })
> > ## both failed very badly in  R <= 4.0.x
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list