[R-pkg-devel] Trouble with long-running tests on CRAN debian server
Uwe Ligges
||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Fri Aug 25 15:37:47 CEST 2023
On 23.08.2023 16:00, Scott Ritchie wrote:
> Hi Uwe,
>
> I agree and have also been burnt myself by programs occupying the
> maximum number of cores available.
>
> My understanding is that in the absence of explicit parallelisation, use
> of data.table in a package should not lead to this type of behaviour?
Yes, that would be my hope, too.
Best,
Uwe Ligges
>
> Best,
>
> Scott
>
> On Wed, 23 Aug 2023 at 14:30, Uwe Ligges
> <ligges using statistik.tu-dortmund.de
> <mailto:ligges using statistik.tu-dortmund.de>> wrote:
>
> I (any many collegues here) have been caught several times by the
> following example:
>
> 1. did something in parallel on a cluster, set up via
> parallel::makeCluster().
> 2. e.g. allocated 20 cores and got them on one single machine
> 3. ran some code in parallel via parLapply()
>
> Bang! 400 threads;
> So I have started 20 parallel processes, each of which is using the
> automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted
> by the cluster to 20 (rather than 1).
>
> Hence, I really believe a default should always be small, not only in
> examples and tests, but generally. And people who aim for more
> should be
> able to increase the defaults.
>
> Do you believe a software that auto-occupies a 96 core machines with 96
> threads by default is sensible?
>
> Best,
> Uwe Ligges
>
>
>
>
>
>
> On 21.08.2023 21:59, Berry Boessenkool wrote:
> >
> > If you add that to each exported function, isn't that a lot of
> code to read + maintain?
> > Also, it seems like unnecessary computational overhead.
> > From a software design point of view, it might be nicer to set
> that in the examples + tests.
> >
> > Regards,
> > Berry
> >
> > ________________________________
> > From: R-package-devel <r-package-devel-bounces using r-project.org
> <mailto:r-package-devel-bounces using r-project.org>> on behalf of Scott
> Ritchie <sritchie73 using gmail.com <mailto:sritchie73 using gmail.com>>
> > Sent: Monday, August 21, 2023 19:23
> > To: Dirk Eddelbuettel <edd using debian.org <mailto:edd using debian.org>>
> > Cc: r-package-devel using r-project.org
> <mailto:r-package-devel using r-project.org>
> <r-package-devel using r-project.org <mailto:r-package-devel using r-project.org>>
> > Subject: Re: [R-pkg-devel] Trouble with long-running tests on
> CRAN debian server
> >
> > Thanks Dirk and Ivan,
> >
> > I took a slightly different work-around of forcing the number of
> threads to
> > 1 when running functions of the test dataset in the package, by
> adding the
> > following to each user facing function:
> >
> > ```
> > # Check if running on package test_data, and if so, force
> data.table to
> > be
> > # single threaded so that we can avoid a NOTE on CRAN submission
> > if (isTRUE(all.equal(x, ukbnmr::test_data))) {
> > registered_threads <- getDTthreads()
> > setDTthreads(1)
> > on.exit({ setDTthreads(registered_threads) }) # re-register
> so no
> > unintended side effects for users
> > }
> > ```
> > (i.e. here x is the input argument to the function)
> >
> > It took some trial and error to get to pass the CRAN tests; the
> number of
> > columns in the input data was also contributing to the problem.
> >
> > Best,
> >
> > Scott
> >
> >
> > On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <edd using debian.org
> <mailto:edd using debian.org>> wrote:
> >
> >>
> >> On 21 August 2023 at 16:05, Ivan Krylov wrote:
> >> | Dirk is probably right that it's a good idea to have
> OMP_THREAD_LIMIT=2
> >> | set on the CRAN check machine. Either that, or place the
> responsibility
> >> | on data.table for setting the right number of threads by
> default. But
> >> | that's a policy question: should a CRAN package start no more
> than two
> >> | threads/child processes even if it doesn't know it's running in an
> >> | environment where the CPU time / elapsed time limit is two?
> >>
> >> Methinks that given this language in the CRAN Repository Policy
> >>
> >> If running a package uses multiple threads/cores it must
> never use more
> >> than two simultaneously: the check farm is a shared resource
> and will
> >> typically be running many checks simultaneously.
> >>
> >> it would indeed be nice if this variable, and/or equivalent
> ones, were set.
> >>
> >> As I mentioned before, I had long added a similar throttle (not for
> >> data.table) in a package I look after (for work, even). So a similar
> >> throttler with optionality is below. I'll add this to my `dang`
> package
> >> collecting various functions.
> >>
> >> A usage example follows. It does nothing by default, ensuring
> 'full power'
> >> but reflects the minimum of two possible options, or an explicit
> count:
> >>
> >> > dang::limitDataTableCores(verbose=TRUE)
> >> Limiting data.table to '12'.
> >> > Sys.setenv("OMP_THREAD_LIMIT"=3);
> >> dang::limitDataTableCores(verbose=TRUE)
> >> Limiting data.table to '3'.
> >> > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
> >> Limiting data.table to '2'.
> >> > dang::limitDataTableCores(1, verbose=TRUE)
> >> Limiting data.table to '1'.
> >> >
> >>
> >> That makes it, in my eyes, preferable to any unconditional
> 'always pick 1
> >> thread'.
> >>
> >> Dirk
> >>
> >>
> >> ##' Set threads for data.table respecting possible local settings
> >> ##'
> >> ##' This function set the number of threads \pkg{data.table}
> will use
> >> ##' while reflecting two possible machine-specific settings from the
> >> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
> >> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
> >> ##' @title Set data.table threads respecting default settingss
> >> ##' @param ncores A numeric or character variable with the desired
> >> ##' count of threads to use
> >> ##' @param verbose A logical value with a default of
> \sQuote{FALSE} to
> >> ##' operate more verbosely
> >> ##' @return The return value of the \pkg{data.table} function
> >> ##' \code{setDTthreads} which is called as a side-effect.
> >> ##' @author Dirk Eddelbuettel
> >> ##' @export
> >> limitDataTableCores <- function(ncores, verbose = FALSE) {
> >> if (missing(ncores)) {
> >> ## start with a simple fallback: 'Ncpus' (if set) or else 2
> >> ncores <- getOption("Ncpus", 2L)
> >> ## also consider OMP_THREAD_LIMIT (cf Writing R
> Extensions), gets
> >> NA if envvar unset
> >> ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
> >> ## and then keep the smaller
> >> ncores <- min(na.omit(c(ncores, ompcores)))
> >> }
> >> stopifnot("Package 'data.table' must be installed." =
> >> requireNamespace("data.table", quietly=TRUE))
> >> stopifnot("Argument 'ncores' must be numeric or character" =
> >> is.numeric(ncores) || is.character(ncores))
> >> if (verbose) message("Limiting data.table to '", ncores, "'.")
> >> data.table::setDTthreads(ncores)
> >> }
> >>
> >> |
> >> | --
> >> | Best regards,
> >> | Ivan
> >> |
> >> | ______________________________________________
> >> | R-package-devel using r-project.org
> <mailto:R-package-devel using r-project.org> mailing list
> >> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
> >>
> >> --
> >> dirk.eddelbuettel.com <http://dirk.eddelbuettel.com> |
> @eddelbuettel | edd using debian.org <mailto:edd using debian.org>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel using r-project.org
> <mailto:R-package-devel using r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel using r-project.org
> <mailto:R-package-devel using r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>
More information about the R-package-devel
mailing list