[R-pkg-devel] Trouble with long-running tests on CRAN debian server

Berry Boessenkool berryboe@@enkoo| @end|ng |rom hotm@||@com
Mon Aug 21 21:59:15 CEST 2023


If you add that to each exported function, isn't that a lot of code to read + maintain?
Also, it seems like unnecessary computational overhead.
>From a software design point of view, it might be nicer to set that in the examples + tests.

Regards,
Berry

________________________________
From: R-package-devel <r-package-devel-bounces using r-project.org> on behalf of Scott Ritchie <sritchie73 using gmail.com>
Sent: Monday, August 21, 2023 19:23
To: Dirk Eddelbuettel <edd using debian.org>
Cc: r-package-devel using r-project.org <r-package-devel using r-project.org>
Subject: Re: [R-pkg-devel] Trouble with long-running tests on CRAN debian server

Thanks Dirk and Ivan,

I took a slightly different work-around of forcing the number of threads to
1 when running functions of the test dataset in the package, by adding the
following to each user facing function:

```
  # Check if running on package test_data, and if so, force data.table to
be
  # single threaded so that we can avoid a NOTE on CRAN submission
  if (isTRUE(all.equal(x, ukbnmr::test_data))) {
    registered_threads <- getDTthreads()
    setDTthreads(1)
    on.exit({ setDTthreads(registered_threads) }) # re-register so no
unintended side effects for users
  }
```
(i.e. here x is the input argument to the function)

It took some trial and error to get to pass the CRAN tests; the number of
columns in the input data was also contributing to the problem.

Best,

Scott


On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <edd using debian.org> wrote:

>
> On 21 August 2023 at 16:05, Ivan Krylov wrote:
> | Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2
> | set on the CRAN check machine. Either that, or place the responsibility
> | on data.table for setting the right number of threads by default. But
> | that's a policy question: should a CRAN package start no more than two
> | threads/child processes even if it doesn't know it's running in an
> | environment where the CPU time / elapsed time limit is two?
>
> Methinks that given this language in the CRAN Repository Policy
>
>   If running a package uses multiple threads/cores it must never use more
>   than two simultaneously: the check farm is a shared resource and will
>   typically be running many checks simultaneously.
>
> it would indeed be nice if this variable, and/or equivalent ones, were set.
>
> As I mentioned before, I had long added a similar throttle (not for
> data.table) in a package I look after (for work, even). So a similar
> throttler with optionality is below. I'll add this to my `dang` package
> collecting various functions.
>
> A usage example follows. It does nothing by default, ensuring 'full power'
> but reflects the minimum of two possible options, or an explicit count:
>
>     > dang::limitDataTableCores(verbose=TRUE)
>     Limiting data.table to '12'.
>     > Sys.setenv("OMP_THREAD_LIMIT"=3);
> dang::limitDataTableCores(verbose=TRUE)
>     Limiting data.table to '3'.
>     > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
>     Limiting data.table to '2'.
>     > dang::limitDataTableCores(1, verbose=TRUE)
>     Limiting data.table to '1'.
>     >
>
> That makes it, in my eyes, preferable to any unconditional 'always pick 1
> thread'.
>
> Dirk
>
>
> ##' Set threads for data.table respecting possible local settings
> ##'
> ##' This function set the number of threads \pkg{data.table} will use
> ##' while reflecting two possible machine-specific settings from the
> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
> ##' @title Set data.table threads respecting default settingss
> ##' @param ncores A numeric or character variable with the desired
> ##' count of threads to use
> ##' @param verbose A logical value with a default of \sQuote{FALSE} to
> ##' operate more verbosely
> ##' @return The return value of the \pkg{data.table} function
> ##' \code{setDTthreads} which is called as a side-effect.
> ##' @author Dirk Eddelbuettel
> ##' @export
> limitDataTableCores <- function(ncores, verbose = FALSE) {
>     if (missing(ncores)) {
>         ## start with a simple fallback: 'Ncpus' (if set) or else 2
>         ncores <- getOption("Ncpus", 2L)
>         ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), gets
> NA if envvar unset
>         ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
>         ## and then keep the smaller
>         ncores <- min(na.omit(c(ncores, ompcores)))
>     }
>     stopifnot("Package 'data.table' must be installed." =
> requireNamespace("data.table", quietly=TRUE))
>     stopifnot("Argument 'ncores' must be numeric or character" =
> is.numeric(ncores) || is.character(ncores))
>     if (verbose) message("Limiting data.table to '", ncores, "'.")
>     data.table::setDTthreads(ncores)
> }
>
> |
> | --
> | Best regards,
> | Ivan
> |
> | ______________________________________________
> | R-package-devel using r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
>

        [[alternative HTML version deleted]]

______________________________________________
R-package-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list