[R-pkg-devel] Trouble with long-running tests on CRAN debian server

Uwe Ligges ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Fri Aug 25 15:37:47 CEST 2023



On 23.08.2023 16:00, Scott Ritchie wrote:
> Hi Uwe,
> 
> I agree and have also been burnt myself by programs occupying the 
> maximum number of cores available.
> 
> My understanding is that in the absence of explicit parallelisation, use 
> of data.table in a package should not lead to this type of behaviour?

Yes, that would be my hope, too.

Best,
Uwe Ligges


> 
> Best,
> 
> Scott
> 
> On Wed, 23 Aug 2023 at 14:30, Uwe Ligges 
> <ligges using statistik.tu-dortmund.de 
> <mailto:ligges using statistik.tu-dortmund.de>> wrote:
> 
>     I (any many collegues here) have been caught several times by the
>     following example:
> 
>     1. did something in parallel on a cluster, set up via
>     parallel::makeCluster().
>     2. e.g. allocated 20 cores and got them on one single machine
>     3. ran some code in parallel via parLapply()
> 
>     Bang! 400 threads;
>     So I have started 20 parallel processes, each of which is using the
>     automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted
>     by the cluster to 20 (rather than 1).
> 
>     Hence, I really believe a default should always be small, not only in
>     examples and tests, but generally. And people who aim for more
>     should be
>     able to increase the defaults.
> 
>     Do you believe a software that auto-occupies a 96 core machines with 96
>     threads by default is sensible?
> 
>     Best,
>     Uwe Ligges
> 
> 
> 
> 
> 
> 
>     On 21.08.2023 21:59, Berry Boessenkool wrote:
>      >
>      > If you add that to each exported function, isn't that a lot of
>     code to read + maintain?
>      > Also, it seems like unnecessary computational overhead.
>      >  From a software design point of view, it might be nicer to set
>     that in the examples + tests.
>      >
>      > Regards,
>      > Berry
>      >
>      > ________________________________
>      > From: R-package-devel <r-package-devel-bounces using r-project.org
>     <mailto:r-package-devel-bounces using r-project.org>> on behalf of Scott
>     Ritchie <sritchie73 using gmail.com <mailto:sritchie73 using gmail.com>>
>      > Sent: Monday, August 21, 2023 19:23
>      > To: Dirk Eddelbuettel <edd using debian.org <mailto:edd using debian.org>>
>      > Cc: r-package-devel using r-project.org
>     <mailto:r-package-devel using r-project.org>
>     <r-package-devel using r-project.org <mailto:r-package-devel using r-project.org>>
>      > Subject: Re: [R-pkg-devel] Trouble with long-running tests on
>     CRAN debian server
>      >
>      > Thanks Dirk and Ivan,
>      >
>      > I took a slightly different work-around of forcing the number of
>     threads to
>      > 1 when running functions of the test dataset in the package, by
>     adding the
>      > following to each user facing function:
>      >
>      > ```
>      >    # Check if running on package test_data, and if so, force
>     data.table to
>      > be
>      >    # single threaded so that we can avoid a NOTE on CRAN submission
>      >    if (isTRUE(all.equal(x, ukbnmr::test_data))) {
>      >      registered_threads <- getDTthreads()
>      >      setDTthreads(1)
>      >      on.exit({ setDTthreads(registered_threads) }) # re-register
>     so no
>      > unintended side effects for users
>      >    }
>      > ```
>      > (i.e. here x is the input argument to the function)
>      >
>      > It took some trial and error to get to pass the CRAN tests; the
>     number of
>      > columns in the input data was also contributing to the problem.
>      >
>      > Best,
>      >
>      > Scott
>      >
>      >
>      > On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <edd using debian.org
>     <mailto:edd using debian.org>> wrote:
>      >
>      >>
>      >> On 21 August 2023 at 16:05, Ivan Krylov wrote:
>      >> | Dirk is probably right that it's a good idea to have
>     OMP_THREAD_LIMIT=2
>      >> | set on the CRAN check machine. Either that, or place the
>     responsibility
>      >> | on data.table for setting the right number of threads by
>     default. But
>      >> | that's a policy question: should a CRAN package start no more
>     than two
>      >> | threads/child processes even if it doesn't know it's running in an
>      >> | environment where the CPU time / elapsed time limit is two?
>      >>
>      >> Methinks that given this language in the CRAN Repository Policy
>      >>
>      >>    If running a package uses multiple threads/cores it must
>     never use more
>      >>    than two simultaneously: the check farm is a shared resource
>     and will
>      >>    typically be running many checks simultaneously.
>      >>
>      >> it would indeed be nice if this variable, and/or equivalent
>     ones, were set.
>      >>
>      >> As I mentioned before, I had long added a similar throttle (not for
>      >> data.table) in a package I look after (for work, even). So a similar
>      >> throttler with optionality is below. I'll add this to my `dang`
>     package
>      >> collecting various functions.
>      >>
>      >> A usage example follows. It does nothing by default, ensuring
>     'full power'
>      >> but reflects the minimum of two possible options, or an explicit
>     count:
>      >>
>      >>      > dang::limitDataTableCores(verbose=TRUE)
>      >>      Limiting data.table to '12'.
>      >>      > Sys.setenv("OMP_THREAD_LIMIT"=3);
>      >> dang::limitDataTableCores(verbose=TRUE)
>      >>      Limiting data.table to '3'.
>      >>      > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
>      >>      Limiting data.table to '2'.
>      >>      > dang::limitDataTableCores(1, verbose=TRUE)
>      >>      Limiting data.table to '1'.
>      >>      >
>      >>
>      >> That makes it, in my eyes, preferable to any unconditional
>     'always pick 1
>      >> thread'.
>      >>
>      >> Dirk
>      >>
>      >>
>      >> ##' Set threads for data.table respecting possible local settings
>      >> ##'
>      >> ##' This function set the number of threads \pkg{data.table}
>     will use
>      >> ##' while reflecting two possible machine-specific settings from the
>      >> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
>      >> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
>      >> ##' @title Set data.table threads respecting default settingss
>      >> ##' @param ncores A numeric or character variable with the desired
>      >> ##' count of threads to use
>      >> ##' @param verbose A logical value with a default of
>     \sQuote{FALSE} to
>      >> ##' operate more verbosely
>      >> ##' @return The return value of the \pkg{data.table} function
>      >> ##' \code{setDTthreads} which is called as a side-effect.
>      >> ##' @author Dirk Eddelbuettel
>      >> ##' @export
>      >> limitDataTableCores <- function(ncores, verbose = FALSE) {
>      >>      if (missing(ncores)) {
>      >>          ## start with a simple fallback: 'Ncpus' (if set) or else 2
>      >>          ncores <- getOption("Ncpus", 2L)
>      >>          ## also consider OMP_THREAD_LIMIT (cf Writing R
>     Extensions), gets
>      >> NA if envvar unset
>      >>          ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
>      >>          ## and then keep the smaller
>      >>          ncores <- min(na.omit(c(ncores, ompcores)))
>      >>      }
>      >>      stopifnot("Package 'data.table' must be installed." =
>      >> requireNamespace("data.table", quietly=TRUE))
>      >>      stopifnot("Argument 'ncores' must be numeric or character" =
>      >> is.numeric(ncores) || is.character(ncores))
>      >>      if (verbose) message("Limiting data.table to '", ncores, "'.")
>      >>      data.table::setDTthreads(ncores)
>      >> }
>      >>
>      >> |
>      >> | --
>      >> | Best regards,
>      >> | Ivan
>      >> |
>      >> | ______________________________________________
>      >> | R-package-devel using r-project.org
>     <mailto:R-package-devel using r-project.org> mailing list
>      >> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>      >>
>      >> --
>      >> dirk.eddelbuettel.com <http://dirk.eddelbuettel.com> |
>     @eddelbuettel | edd using debian.org <mailto:edd using debian.org>
>      >>
>      >
>      >          [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > R-package-devel using r-project.org
>     <mailto:R-package-devel using r-project.org> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > R-package-devel using r-project.org
>     <mailto:R-package-devel using r-project.org> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>



More information about the R-package-devel mailing list