[R-pkg-devel] rstan issue [Was: CRAN submission error when running tests in testthat]

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Thu Nov 25 23:56:39 CET 2021


Kevin,

thanks, that's very helpful! So this is a serious bug in rstan - apparently they only do that on macOS which explains why other platforms don't see it:

.onLoad <- function(libname, pkgname) {
[...]
  ## the tbbmalloc_proxy is not loaded by RcppParallel which is linked
  ## in by default on macOS; unloading only works under R >= 4.0 so that
  ## this is only done for R >= 4.0
  if(R.version$major >= 4 && Sys.info()["sysname"] == "Darwin") {
      tbbmalloc_proxy  <- system.file("lib/libtbbmalloc_proxy.dylib", package="RcppParallel", mustWork=FALSE)
      tbbmalloc_proxyDllInfo <<- dyn.load(tbbmalloc_proxy, local = FALSE, now = TRUE)
  }

I can confirm that commenting out that part solves the segfault and BCEA passes the tests.

@Ben, please fix and submit a new version of rstan (see discussion below).

Thanks,
Simon



> On Nov 26, 2021, at 11:19 AM, Kevin Ushey <kevinushey using gmail.com> wrote:
> 
> That shouldn't be happening, at least not by default. However, RcppParallel does ship with tbbmalloc_proxy, which is a library that, when loaded, will overload the default allocators to use TBB's allocators instead. The intention is normally that these libraries would be loaded via e.g. LD_PRELOAD or something similar, since changing the allocator at runtime would cause these sorts of issues.
> 
> If I test with the following:
> 
> trace(dyn.load, quote({
>   if (grepl("tbbmalloc_proxy", x))
>     print(rlang::trace_back())
> }), print = FALSE)
> 
> devtools::test()
> 
> then I see:
> 
>   1. ├─base::load(test_path("data", "stanfit.RData")) at test-bcea.R:179:2
>   2. └─base::..getNamespace(`<chr>`, "stanfit")
>   3.   ├─base::tryCatch(...)
>   4.   │ └─base tryCatchList(expr, classes, parentenv, handlers)
>   5.   │   └─base tryCatchOne(expr, names, parentenv, handlers[[1L]])
>   6.   │     └─base doTryCatch(return(expr), name, parentenv, handler)
>   7.   └─base::loadNamespace(name)
>   8.     └─base runHook(".onLoad", env, package.lib, package)
>   9.       ├─base::tryCatch(fun(libname, pkgname), error = identity)
>  10.       │ └─base tryCatchList(expr, classes, parentenv, handlers)
>  11.       │   └─base tryCatchOne(expr, names, parentenv, handlers[[1L]])
>  12.       │     └─base doTryCatch(return(expr), name, parentenv, handler)
>  13.       └─rstan fun(libname, pkgname)
>  14.         └─base::dyn.load(tbbmalloc_proxy, local = FALSE, now = TRUE)
> 
> My guess is that the 'rstan' package is trying to forcefully load libtbbmalloc_proxy.dylib at runtime, and that's causing the issue. IMHO 'rstan' shouldn't be doing that, at least definitely not by default.
> 
> Best,
> Kevin
> 
> On Thu, Nov 25, 2021 at 12:54 PM Simon Urbanek <simon.urbanek using r-project.org> wrote:
> Nathan,
> 
> testthat is notorious for obfuscation and unhelpful output as can be clearly seen in the head of testthat.Rout.fail:
> 
> > library(testthat)
> > library(BCEA)
> 
> Attaching package: 'BCEA'
> 
> The following object is masked from 'package:graphics':
> 
>     contour
> 
> > 
> > test_check("BCEA")
> 
>  *** caught segfault ***
> address 0x10d492ffc, cause 'memory not mapped'
> 
> However this appears to be hard to debug, because it is a fall out from some memory corruption and/or allocator mistmatch: the crash happens in free() while doing GC (see below). Since it happens in the GC, many bad things happen afterwards.
> With some lldb magic I could trace that the crash happens during
>  ..getNamespace(c("Matrix", "1.3-3"), "stanfit")
>  load(test_path("data", "stanfit.RData"))
> but as I said that's likely too late - the memory corruption/issue likely happened before. Since BCEA itself doesn't have native code, this is likely a bug in some of the packages it depends on, but quite a serious one since it affects subsequent code in R.
> 
> The list of packages loaded at the time of the crash - so one of them is the culprit:
> 
>  [1] "rstan"          "tidyselect"     "purrr"          "reshape2"      
>  [5] "lattice"        "V8"             "colorspace"     "vctrs"         
>  [9] "generics"       "testthat"       "stats4"         "BCEA"          
> [13] "loo"            "grDevices"      "R2jags"         "utf8"          
> [17] "rlang"          "pkgbuild"       "pillar"         "glue"          
> [21] "withr"          "DBI"            "matrixStats"    "lifecycle"     
> [25] "plyr"           "stringr"        "munsell"        "gtable"        
> [29] "coda"           "codetools"      "inline"         "callr"         
> [33] "ps"             "parallel"       "curl"           "fansi"         
> [37] "methods"        "Rcpp"           "scales"         "desc"          
> [41] "RcppParallel"   "StanHeaders"    "GrassmannOptim" "jsonlite"      
> [45] "abind"          "gridExtra"      "winch"          "rjags"         
> [49] "ggplot2"        "stats"          "datasets"       "graphics"      
> [53] "stringi"        "processx"       "dplyr"          "grid"          
> [57] "rprojroot"      "cli"            "tools"          "magrittr"      
> [61] "tibble"         "crayon"         "pkgconfig"      "Matrix"        
> [65] "MASS"           "ellipsis"       "utils"          "prettyunits"   
> [69] "assertthat"     "base"           "boot"           "R6"            
> [73] "R2WinBUGS"      "compiler"      
> 
> My guess would be that the issue could be in RcppParallel which overrides the memory allocator:
> 
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x11649fffc)
>   * frame #0: 0x00000001097c517f libtbbmalloc.dylib`__TBB_malloc_safer_msize + 63
>     frame #1: 0x00007fff76f746fd libsystem_malloc.dylib`free + 96
>     frame #2: 0x00000001001c9227 libR.dylib`RunGenCollect at memory.c:1114 [opt]
>     frame #3: 0x00000001001c9038 libR.dylib`RunGenCollect(size_needed=0) at memory.c:1896 [opt]
>     frame #4: 0x00000001001bf769 libR.dylib`R_gc_internal(size_needed=0) at memory.c:3129 [opt]
> 
> (lldb) image lookup -va 0x00000001097c517f
>       Address: libtbbmalloc.dylib[0x000000000001117f] (libtbbmalloc.dylib.__TEXT.__text + 65375)
>       Summary: libtbbmalloc.dylib`__TBB_malloc_safer_msize + 63
>        Module: file = "/Volumes/Builds/packages/high-sierra-x86_64/Rlib/4.1/RcppParallel/lib/libtbbmalloc.dylib", arch = "x86_64"
>        Symbol: id = {0x0000060c}, range = [0x00000001097c5140-0x00000001097c5290), mangled="__TBB_malloc_safer_msize"
> 
> but that's just a wild guess... (CCing Kevin just in case he can shed a light on whether TBB allocator should be involved in regular R garbage collection).
> 
> Cheers,
> Simon
> 
> 
> 
> > On Nov 25, 2021, at 5:37 AM, Nathan Green via R-package-devel <r-package-devel using r-project.org> wrote:
> > 
> > Hi,
> > I've getting an ERROR when submitting a new release of our package BCEA to CRAN which I'm having problems understanding and reproducing. Its passing CHECK locally and GitHub Actions standard check (https://github.com/n8thangreen/BCEA/actions/runs/1494595896).
> > The message is something to do with testthat. Any help would be gratefully received.
> > Thanks!
> > Nathan
> > 
> > From https://cran.r-project.org/web/checks/check_results_BCEA.html
> > Here's the error message:
> > Check: tests, Result: ERROR
> >    Running ‘testthat.R’ [5s/5s]
> >  Running the tests in ‘tests/testthat.R’ failed.
> >  Last 13 lines of output:
> >    33: tryCatch(withCallingHandlers({    eval(code, test_env)    if (!handled && !is.null(test)) {        skip_empty()    }}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,     message = handle_message, error = handle_error), error = handle_fatal,     skip = function(e) {    })
> >    34: test_code(NULL, exprs, env)
> >    35: source_file(path, child_env(env), wrap = wrap)
> >    36: FUN(X[[i]], ...)
> >    37: lapply(test_paths, test_one_file, env = env, wrap = wrap)
> >    38: doTryCatch(return(expr), name, parentenv, handler)
> >    39: tryCatchOne(expr, names, parentenv, handlers[[1L]])
> >    40: tryCatchList(expr, classes, parentenv, handlers)
> >    41: tryCatch(code, testthat_abort_reporter = function(cnd) {    cat(conditionMessage(cnd), "\n")    NULL})
> >    42: with_reporter(reporters$multi, lapply(test_paths, test_one_file,     env = env, wrap = wrap))
> >    43: test_files(test_dir = test_dir, test_package = test_package,     test_paths = test_paths, load_helpers = load_helpers, reporter = reporter,     env = env, stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,     wrap = wrap, load_package = load_package)
> >    44: test_files(test_dir = path, test_paths = test_paths, test_package = package,     reporter = reporter, load_helpers = load_helpers, env = env,     stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,     wrap = wrap, load_package = load_package, parallel = parallel)
> >    45: test_dir("testthat", package = package, reporter = reporter,     ..., load_package = "installed")
> >    46: test_check("BCEA")
> >    An irrecoverable exception occurred. R is aborting now ...
> > See: <https://www.r-project.org/nosvn/R.check/r-release-macos-x86_64/BCEA-00check.html>,
> >     <https://www.r-project.org/nosvn/R.check/r-oldrel-macos-x86_64/BCEA-00check.html>
> > 
> >                                                                         Dr Nathan Green
> > @: n8thangreen using yahoo.co.ukTel: 07821 318353
> > 
> > 
> >       [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-package-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> > 
> 



More information about the R-package-devel mailing list