[R-pkg-devel] Trouble with long-running tests on CRAN debian server

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Wed Aug 23 15:58:37 CEST 2023


To whom are you addressing this question? The OpenMP developers who define the missing-OMP_THREAD_LIMIT behaviour and-or supply default config files? The CRAN server administrators who set the variable in their site-wide configuration intentionally or unintentionally? Or the package authors expected to kludge in settings to override those defaults for CRAN testing while not overriding them in normal use?

I would vote for explicitly addressing this (rhetorical?) question to the CRAN server administrators...

On August 23, 2023 6:31:01 AM PDT, Uwe Ligges <ligges using statistik.tu-dortmund.de> wrote:
>I (any many collegues here) have been caught several times by the following example:
>
>1. did something in parallel on a cluster, set up via parallel::makeCluster().
>2. e.g. allocated 20 cores and got them on one single machine
>3. ran some code in parallel via parLapply()
>
>Bang! 400 threads;
>So I have started 20 parallel processes, each of which is using the automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted by the cluster to 20 (rather than 1).
>
>Hence, I really believe a default should always be small, not only in examples and tests, but generally. And people who aim for more should be able to increase the defaults.
>
>Do you believe a software that auto-occupies a 96 core machines with 96 threads by default is sensible?
>
>Best,
>Uwe Ligges
>
>
>
>
>
>
>On 21.08.2023 21:59, Berry Boessenkool wrote:
>> 
>> If you add that to each exported function, isn't that a lot of code to read + maintain?
>> Also, it seems like unnecessary computational overhead.
>>  From a software design point of view, it might be nicer to set that in the examples + tests.
>> 
>> Regards,
>> Berry
>> 
>> ________________________________
>> From: R-package-devel <r-package-devel-bounces using r-project.org> on behalf of Scott Ritchie <sritchie73 using gmail.com>
>> Sent: Monday, August 21, 2023 19:23
>> To: Dirk Eddelbuettel <edd using debian.org>
>> Cc: r-package-devel using r-project.org <r-package-devel using r-project.org>
>> Subject: Re: [R-pkg-devel] Trouble with long-running tests on CRAN debian server
>> 
>> Thanks Dirk and Ivan,
>> 
>> I took a slightly different work-around of forcing the number of threads to
>> 1 when running functions of the test dataset in the package, by adding the
>> following to each user facing function:
>> 
>> ```
>>    # Check if running on package test_data, and if so, force data.table to
>> be
>>    # single threaded so that we can avoid a NOTE on CRAN submission
>>    if (isTRUE(all.equal(x, ukbnmr::test_data))) {
>>      registered_threads <- getDTthreads()
>>      setDTthreads(1)
>>      on.exit({ setDTthreads(registered_threads) }) # re-register so no
>> unintended side effects for users
>>    }
>> ```
>> (i.e. here x is the input argument to the function)
>> 
>> It took some trial and error to get to pass the CRAN tests; the number of
>> columns in the input data was also contributing to the problem.
>> 
>> Best,
>> 
>> Scott
>> 
>> 
>> On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <edd using debian.org> wrote:
>> 
>>> 
>>> On 21 August 2023 at 16:05, Ivan Krylov wrote:
>>> | Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2
>>> | set on the CRAN check machine. Either that, or place the responsibility
>>> | on data.table for setting the right number of threads by default. But
>>> | that's a policy question: should a CRAN package start no more than two
>>> | threads/child processes even if it doesn't know it's running in an
>>> | environment where the CPU time / elapsed time limit is two?
>>> 
>>> Methinks that given this language in the CRAN Repository Policy
>>> 
>>>    If running a package uses multiple threads/cores it must never use more
>>>    than two simultaneously: the check farm is a shared resource and will
>>>    typically be running many checks simultaneously.
>>> 
>>> it would indeed be nice if this variable, and/or equivalent ones, were set.
>>> 
>>> As I mentioned before, I had long added a similar throttle (not for
>>> data.table) in a package I look after (for work, even). So a similar
>>> throttler with optionality is below. I'll add this to my `dang` package
>>> collecting various functions.
>>> 
>>> A usage example follows. It does nothing by default, ensuring 'full power'
>>> but reflects the minimum of two possible options, or an explicit count:
>>> 
>>>      > dang::limitDataTableCores(verbose=TRUE)
>>>      Limiting data.table to '12'.
>>>      > Sys.setenv("OMP_THREAD_LIMIT"=3);
>>> dang::limitDataTableCores(verbose=TRUE)
>>>      Limiting data.table to '3'.
>>>      > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
>>>      Limiting data.table to '2'.
>>>      > dang::limitDataTableCores(1, verbose=TRUE)
>>>      Limiting data.table to '1'.
>>>      >
>>> 
>>> That makes it, in my eyes, preferable to any unconditional 'always pick 1
>>> thread'.
>>> 
>>> Dirk
>>> 
>>> 
>>> ##' Set threads for data.table respecting possible local settings
>>> ##'
>>> ##' This function set the number of threads \pkg{data.table} will use
>>> ##' while reflecting two possible machine-specific settings from the
>>> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
>>> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
>>> ##' @title Set data.table threads respecting default settingss
>>> ##' @param ncores A numeric or character variable with the desired
>>> ##' count of threads to use
>>> ##' @param verbose A logical value with a default of \sQuote{FALSE} to
>>> ##' operate more verbosely
>>> ##' @return The return value of the \pkg{data.table} function
>>> ##' \code{setDTthreads} which is called as a side-effect.
>>> ##' @author Dirk Eddelbuettel
>>> ##' @export
>>> limitDataTableCores <- function(ncores, verbose = FALSE) {
>>>      if (missing(ncores)) {
>>>          ## start with a simple fallback: 'Ncpus' (if set) or else 2
>>>          ncores <- getOption("Ncpus", 2L)
>>>          ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), gets
>>> NA if envvar unset
>>>          ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
>>>          ## and then keep the smaller
>>>          ncores <- min(na.omit(c(ncores, ompcores)))
>>>      }
>>>      stopifnot("Package 'data.table' must be installed." =
>>> requireNamespace("data.table", quietly=TRUE))
>>>      stopifnot("Argument 'ncores' must be numeric or character" =
>>> is.numeric(ncores) || is.character(ncores))
>>>      if (verbose) message("Limiting data.table to '", ncores, "'.")
>>>      data.table::setDTthreads(ncores)
>>> }
>>> 
>>> |
>>> | --
>>> | Best regards,
>>> | Ivan
>>> |
>>> | ______________________________________________
>>> | R-package-devel using r-project.org mailing list
>>> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>> 
>>> --
>>> dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
>>> 
>> 
>>          [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>______________________________________________
>R-package-devel using r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-package-devel

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-package-devel mailing list