[R-pkg-devel] rstan issue [Was: CRAN submission error when running tests in testthat]

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Fri Nov 26 20:15:27 CET 2021


Nathan,

no action is needed on your end since it's not your fault. It was good of you to have the test there because it unearthed the issue. I have re-run the test with hot-fixed rstan and it passes the check so you're good as far as I'm concerned. As you say, since it's not a strong dependency users can use your package even if rstan is broken. More urgently we need an update from rstan and stanette (.onUnoad needs corresponding fix in both cases as well).

Cheers,
Simon


> On Nov 27, 2021, at 2:10 AM, Nathan Green <n8thangreen using yahoo.co.uk> wrote:
> 
> Thanks everyone for your help with this.
> TBH the technical nature is more than a little above my head.
> What are you advising as the current solution to enable me to resubmit the package to CRAN?
> BCEA doesn't have a strong dependency on rstan so I could remove it as a simple solution?
> Thanks again for all the help!
> 
> Nathan
> 
> 
> 
>                                                                         
> Dr Nathan Green
> 
> @: n8thangreen using yahoo.co.uk
> Tel: 07821 318353
> 
> 
> 
> On Thursday, 25 November 2021, 22:56:45 GMT, Simon Urbanek <simon.urbanek using r-project.org> wrote:
> 
> 
> Kevin,
> 
> thanks, that's very helpful! So this is a serious bug in rstan - apparently they only do that on macOS which explains why other platforms don't see it:
> 
> .onLoad <- function(libname, pkgname) {
> [...]
>   ## the tbbmalloc_proxy is not loaded by RcppParallel which is linked
>   ## in by default on macOS; unloading only works under R >= 4.0 so that
>   ## this is only done for R >= 4.0
>   if(R.version$major >= 4 && Sys.info()["sysname"] == "Darwin") {
>       tbbmalloc_proxy  <- system.file("lib/libtbbmalloc_proxy.dylib", package="RcppParallel", mustWork=FALSE)
>       tbbmalloc_proxyDllInfo <<- dyn.load(tbbmalloc_proxy, local = FALSE, now = TRUE)
>   }
> 
> I can confirm that commenting out that part solves the segfault and BCEA passes the tests.
> 
> @Ben, please fix and submit a new version of rstan (see discussion below).
> 
> Thanks,
> Simon
> 
> 
> 
> > On Nov 26, 2021, at 11:19 AM, Kevin Ushey <kevinushey using gmail.com> wrote:
> > 
> > That shouldn't be happening, at least not by default. However, RcppParallel does ship with tbbmalloc_proxy, which is a library that, when loaded, will overload the default allocators to use TBB's allocators instead. The intention is normally that these libraries would be loaded via e.g. LD_PRELOAD or something similar, since changing the allocator at runtime would cause these sorts of issues.
> > 
> > If I test with the following:
> > 
> > trace(dyn.load, quote({
> >  if (grepl("tbbmalloc_proxy", x))
> >    print(rlang::trace_back())
> > }), print = FALSE)
> > 
> > devtools::test()
> > 
> > then I see:
> > 
> >  1. ├─base::load(test_path("data", "stanfit.RData")) at test-bcea.R:179:2
> >  2. └─base::..getNamespace(`<chr>`, "stanfit")
> >  3.  ├─base::tryCatch(...)
> >  4.  │ └─base tryCatchList(expr, classes, parentenv, handlers)
> >  5.  │  └─base tryCatchOne(expr, names, parentenv, handlers[[1L]])
> >  6.  │    └─base doTryCatch(return(expr), name, parentenv, handler)
> >  7.  └─base::loadNamespace(name)
> >  8.    └─base runHook(".onLoad", env, package.lib, package)
> >  9.      ├─base::tryCatch(fun(libname, pkgname), error = identity)
> >  10.      │ └─base tryCatchList(expr, classes, parentenv, handlers)
> >  11.      │  └─base tryCatchOne(expr, names, parentenv, handlers[[1L]])
> >  12.      │    └─base doTryCatch(return(expr), name, parentenv, handler)
> >  13.      └─rstan fun(libname, pkgname)
> >  14.        └─base::dyn.load(tbbmalloc_proxy, local = FALSE, now = TRUE)
> > 
> > My guess is that the 'rstan' package is trying to forcefully load libtbbmalloc_proxy.dylib at runtime, and that's causing the issue. IMHO 'rstan' shouldn't be doing that, at least definitely not by default.
> > 
> > Best,
> > Kevin
> > 
> > On Thu, Nov 25, 2021 at 12:54 PM Simon Urbanek <simon.urbanek using r-project.org> wrote:
> > Nathan,
> > 
> > testthat is notorious for obfuscation and unhelpful output as can be clearly seen in the head of testthat.Rout.fail:
> > 
> > > library(testthat)
> > > library(BCEA)
> > 
> > Attaching package: 'BCEA'
> > 
> > The following object is masked from 'package:graphics':
> > 
> >    contour
> > 
> > > 
> > > test_check("BCEA")
> > 
> >  *** caught segfault ***
> > address 0x10d492ffc, cause 'memory not mapped'
> > 
> > However this appears to be hard to debug, because it is a fall out from some memory corruption and/or allocator mistmatch: the crash happens in free() while doing GC (see below). Since it happens in the GC, many bad things happen afterwards.
> > With some lldb magic I could trace that the crash happens during
> >  ..getNamespace(c("Matrix", "1.3-3"), "stanfit")
> >  load(test_path("data", "stanfit.RData"))
> > but as I said that's likely too late - the memory corruption/issue likely happened before. Since BCEA itself doesn't have native code, this is likely a bug in some of the packages it depends on, but quite a serious one since it affects subsequent code in R.
> > 
> > The list of packages loaded at the time of the crash - so one of them is the culprit:
> > 
> >  [1] "rstan"          "tidyselect"    "purrr"          "reshape2"      
> >  [5] "lattice"        "V8"            "colorspace"    "vctrs"        
> >  [9] "generics"      "testthat"      "stats4"        "BCEA"          
> > [13] "loo"            "grDevices"      "R2jags"        "utf8"          
> > [17] "rlang"          "pkgbuild"      "pillar"        "glue"          
> > [21] "withr"          "DBI"            "matrixStats"    "lifecycle"    
> > [25] "plyr"          "stringr"        "munsell"        "gtable"        
> > [29] "coda"          "codetools"      "inline"        "callr"        
> > [33] "ps"            "parallel"      "curl"          "fansi"        
> > [37] "methods"        "Rcpp"          "scales"        "desc"          
> > [41] "RcppParallel"  "StanHeaders"    "GrassmannOptim" "jsonlite"      
> > [45] "abind"          "gridExtra"      "winch"          "rjags"        
> > [49] "ggplot2"        "stats"          "datasets"      "graphics"      
> > [53] "stringi"        "processx"      "dplyr"          "grid"          
> > [57] "rprojroot"      "cli"            "tools"          "magrittr"      
> > [61] "tibble"        "crayon"        "pkgconfig"      "Matrix"        
> > [65] "MASS"          "ellipsis"      "utils"          "prettyunits"  
> > [69] "assertthat"    "base"          "boot"          "R6"            
> > [73] "R2WinBUGS"      "compiler"      
> > 
> > My guess would be that the issue could be in RcppParallel which overrides the memory allocator:
> > 
> > * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x11649fffc)
> >  * frame #0: 0x00000001097c517f libtbbmalloc.dylib`__TBB_malloc_safer_msize + 63
> >    frame #1: 0x00007fff76f746fd libsystem_malloc.dylib`free + 96
> >    frame #2: 0x00000001001c9227 libR.dylib`RunGenCollect at memory.c:1114 [opt]
> >    frame #3: 0x00000001001c9038 libR.dylib`RunGenCollect(size_needed=0) at memory.c:1896 [opt]
> >    frame #4: 0x00000001001bf769 libR.dylib`R_gc_internal(size_needed=0) at memory.c:3129 [opt]
> > 
> > (lldb) image lookup -va 0x00000001097c517f
> >      Address: libtbbmalloc.dylib[0x000000000001117f] (libtbbmalloc.dylib.__TEXT.__text + 65375)
> >      Summary: libtbbmalloc.dylib`__TBB_malloc_safer_msize + 63
> >        Module: file = "/Volumes/Builds/packages/high-sierra-x86_64/Rlib/4.1/RcppParallel/lib/libtbbmalloc.dylib", arch = "x86_64"
> >        Symbol: id = {0x0000060c}, range = [0x00000001097c5140-0x00000001097c5290), mangled="__TBB_malloc_safer_msize"
> > 
> > but that's just a wild guess... (CCing Kevin just in case he can shed a light on whether TBB allocator should be involved in regular R garbage collection).
> > 
> > Cheers,
> > Simon
> > 
> > 
> > 
> > > On Nov 25, 2021, at 5:37 AM, Nathan Green via R-package-devel <r-package-devel using r-project.org> wrote:
> > > 
> > > Hi,
> > > I've getting an ERROR when submitting a new release of our package BCEA to CRAN which I'm having problems understanding and reproducing. Its passing CHECK locally and GitHub Actions standard check (https://github.com/n8thangreen/BCEA/actions/runs/1494595896).
> > > The message is something to do with testthat. Any help would be gratefully received.
> > > Thanks!
> > > Nathan
> > > 
> > > From https://cran.r-project.org/web/checks/check_results_BCEA.html
> > > Here's the error message:
> > > Check: tests, Result: ERROR
> > >    Running ‘testthat.R’ [5s/5s]
> > >  Running the tests in ‘tests/testthat.R’ failed.
> > >  Last 13 lines of output:
> > >    33: tryCatch(withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,    message = handle_message, error = handle_error), error = handle_fatal,    skip = function(e) {    })
> > >    34: test_code(NULL, exprs, env)
> > >    35: source_file(path, child_env(env), wrap = wrap)
> > >    36: FUN(X[[i]], ...)
> > >    37: lapply(test_paths, test_one_file, env = env, wrap = wrap)
> > >    38: doTryCatch(return(expr), name, parentenv, handler)
> > >    39: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> > >    40: tryCatchList(expr, classes, parentenv, handlers)
> > >    41: tryCatch(code, testthat_abort_reporter = function(cnd) {    cat(conditionMessage(cnd), "\n")    NULL})
> > >    42: with_reporter(reporters$multi, lapply(test_paths, test_one_file,    env = env, wrap = wrap))
> > >    43: test_files(test_dir = test_dir, test_package = test_package,    test_paths = test_paths, load_helpers = load_helpers, reporter = reporter,    env = env, stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,    wrap = wrap, load_package = load_package)
> > >    44: test_files(test_dir = path, test_paths = test_paths, test_package = package,    reporter = reporter, load_helpers = load_helpers, env = env,    stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,    wrap = wrap, load_package = load_package, parallel = parallel)
> > >    45: test_dir("testthat", package = package, reporter = reporter,    ..., load_package = "installed")
> > >    46: test_check("BCEA")
> > >    An irrecoverable exception occurred. R is aborting now ...
> > > See: <https://www.r-project.org/nosvn/R.check/r-release-macos-x86_64/BCEA-00check.html>,
> > >    <https://www.r-project.org/nosvn/R.check/r-oldrel-macos-x86_64/BCEA-00check.html>
> > > 
> > >                                                                        Dr Nathan Green
> > > @: n8thangreen using yahoo.co.ukTel: 07821 318353
> > > 
> > > 
> > >      [[alternative HTML version deleted]]
> > > 
> > > ______________________________________________
> > > R-package-devel using r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> > > 
> > 
> 



More information about the R-package-devel mailing list