[R-pkg-devel] Too many processes spawned on Windows and Debian, although only 2 cores should be used

Wed Nov 16 17:13:49 CET 2022

Hello.

As already pointed out, the current R implementation treats any
non-empty value on _R_CHECK_LIMIT_CORES_ different from "false" as a
true value, e.g. "TRUE", "true", "T", "1", but also "donald duck".
Using '--as-cran' sets _R_CHECK_LIMIT_CORES_="TRUE", if unset.  If
already set, it'll not touch it.  So, it could be that a CRAN check
server already uses, say, _R_CHECK_LIMIT_CORES_="true".  We cannot
make assumptions about that.

To make your life, and an end-user's too, easier, I suggest just using

  num_workers <- 2L

without conditioning on running on CRAN or not.

Why? There are many problems with using parallel::detectCores().

First of all, it can return NA_integer_ on some systems, so you cannot
assume it gives a valid value (== error).  It can also return 1L,
which means your 'num_workers - 1' will give zero worker (== error).
You need to account for that if you rely on detectCores().

Second, detectCores() returns number of physical CPU cores. It's
getting more and more common to run in "cgroups" constrained
environments where your R process only gets access to a fraction of
these cores.  Such constrains are in place in many shared multi-user
HPC environments, and sometimes when using Linux containers (e.g.
Docker, Apptainer, and Podman).  A notable example of this is when
using the RStudio Cloud.  So, if you use detectCores() on those
systems, you'll actually over-parallelize, which slows things down and
you risk running out of memory. For example, you might launch 64
parallel workers when you only have access to four CPU cores.  Each
core will be clogged up by 16 workers.

Third, if you default to detectCores() and a user runs your code on a
machine shared by many users, the other users will not be happy.  Note
that the user will often not know they're overusing the machine.  So,
it's a loss-loss for everyone.

Fourth, detectCores() will return *all* physical CPU cores on the
current machine. These days we have machines with 128, 196, and more
cores.  Are you sure your software will actually run faster when using
that many cores?  The benefit from parallelization tends to decrease
as you add more workers until there is no longer an speed improvement.
If you keep adding more parallel workers you're going to see a
negative effect, i.e. you're penalized when parallelization too much.
So, be aware that when you test on 16 or 24 cores and things runs
really fast, that might not be the experience for other users, or
users in the future (who will have access to more CPU cores).

So, yes, I suggest not to use num_workers <- detectCores().  Pick a
fixed number instead, and the CRAN policy suggests using two.  You can
let the user control how many they want to use.  As a developer, it's
really really ... (read impossible) to know how many they want to use.

Cheers,

Henrik

PS. Note that detectCores() returns a single integer value (possible
NA_integer_).  Because of this, there is no need to subset with
num_workers[1]. I've seen this used in code; not sure where it comes
from but it looks like a cut'n'paste behavior.

On Wed, Nov 16, 2022 at 6:38 AM Riko Kelter <riko.kelter using uni-siegen.de> wrote:
>
> Hi Ivan,
>
> thanks for the info, I changed the check as you pointed out and it
> worked. R CMD build and R CMD check --as-cran run without errors or
> warnings on Linux + MacOS. However, I uploaded the package again at the
> WINBUILDER service and obtained the following weird error:
>
> * checking re-building of vignette outputs ... ERROR
> Check process probably crashed or hung up for 20 minutes ... killed
> Most likely this happened in the example checks (?),
> if not, ignore the following last lines of example output:
>
> ======== End of example output (where/before crash/hang up occured ?) ========
>
> Strangely, there are no examples included in any .Rd file. Also, I
> checked whether a piece of code spawns new clusters. However, the
> critical lines are inside a function which is repeatedly called in the
> vignettes. The parallelized part looks as copied below. After the code
> is executed the cluster is stopped. I use registerDoSNOW(cl) because
> otherwise my progress bar does not work.
>
>
> Code:
>
> ############################### CHECK CORES
>
> chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
>    if (nzchar(chk) && (chk != "false")){  # then limit the workers
>      num_workers <- 2L
>    } else {
>      # use all cores
>      num_workers <- parallel::detectCores()
>    }
>
>    chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")
>
>    cl <- parallel::makeCluster(num_workers[1]-1) # not to overload your
> computer
>    #doParallel::registerDoParallel(cl)
>    doSNOW::registerDoSNOW(cl)
>
> ############################### SET UP PROGRESS BAR
>
> pb <- progress_bar$new(
>      format = "Iteration = :letter [:bar] :elapsed | expected time till
> finish: :eta",
>      total = nsim,    # 100
>      width = 120)
>
>    progress_letter <- seq(1,nsim)  # token reported in progress bar
>
>    # allowing progress bar to be used in foreach
> -----------------------------
>    progress <- function(n){
>      pb$tick(tokens = list(letter = progress_letter[n]))
>    }
>
>    opts <- list(progress = progress)
>
> ############################### MAIN SIMULATION
>
> if(method=="PP"){
>      finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind, .packages
> = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>        tempMatrix = singleTrial_PP(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0)
>
>        tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>      }
>    }
>
>    if(method=="PPe"){
>      refFunc = refFunc
>      nu = nu
>      shape1 = shape1
>      shape2 = shape2
>      if(refFunc == "flat"){
>        finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind,
> .packages = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>          tempMatrix = singleTrial_PPe(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0, refFunc = "flat")
>
>          tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>        }
>      }
>      if(refFunc == "beta"){
>        finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind,
> .packages = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>          tempMatrix = singleTrial_PPe(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0, refFunc = "beta",
>                                       shape1 = shape1, shape2 = shape2)
>
>          tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>        }
>      }
>      if(refFunc == "binaryStep"){
>        finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind,
> .packages = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>          tempMatrix = singleTrial_PPe(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0, refFunc = "binaryStep",
>                                       shape1 = shape1, shape2 = shape2,
> truncation = truncation)
>
>          tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>        }
>      }
>      if(refFunc == "relu"){
>        finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind,
> .packages = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>          tempMatrix = singleTrial_PPe(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0, refFunc = "relu",
>                                       shape1 = shape1, shape2 = shape2,
> truncation = truncation)
>
>          tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>        }
>      }
>      if(refFunc == "palu"){
>        finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind,
> .packages = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>          tempMatrix = singleTrial_PPe(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0, refFunc = "palu",
>                                       shape1 = shape1, shape2 = shape2,
> truncation = truncation)
>
>          tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>        }
>      }
>      if(refFunc == "lolu"){
>        finalMatrix <- foreach::foreach(s=1:nsim, .combine=rbind,
> .packages = c("extraDistr", "fbst"), .options.snow = opts) %dopar% {
>          tempMatrix = singleTrial_PPe(s = s, n=nInit, responseMatrix =
> responseMatrix, nInit = nInit, Nmax = Nmax, batchsize = batchsize, a0 =
> a0, b0 = b0, refFunc = "lolu",
>                                       shape1 = shape1, shape2 = shape2,
> truncation = truncation)
>
>          tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix,
> tempMatrix)
>        }
>      }
>    }
>
> ############################### STOP CLUSTER
>
> parallel::stopCluster(cl) #stop cluster
>
>
>
> Kind regards,
>
> Riko
>
>
> Am 16.11.22 um 08:29 schrieb Ivan Krylov:
> > В Wed, 16 Nov 2022 07:29:25 +0100
> > Riko Kelter<riko.kelter using uni-siegen.de>  пишет:
> >
> >> if (nzchar(chk) && chk == "TRUE") {
> >>   # use 2 cores in CRAN/Travis/AppVeyor
> >>   num_workers <- 2L
> >> }
> > The check in parallel:::.check_ncores is a bit different:
> >
> > chk <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", ""))
> > if (nzchar(chk) && (chk != "false")) # then limit the workers
> >
> > Unless you actually set _R_CHECK_LIMIT_CORES_=FALSE on your machine
> > when running the checks, I would perform a more pessimistic check of
> > nzchar(chk) (without additionally checking whether it's TRUE or not
> > FALSE), though copy-pasting the check from parallel:::.check_ncores
> > should also work.
> >
> > Can we see the rest of the vignette? Perhaps the problem is not with
> > the check. For example, a piece of code might be implicitly spawning a
> > new cluster, defaulting to all of the cores instead of num_workers.
> >
> >>      [[alternative HTML version deleted]]
> > Unfortunately, the plain text version of your message prepared by your
> > mailer has all the code samples mangled:
> > https://stat.ethz.ch/pipermail/r-package-devel/2022q4/008647.html
> >
> > Please compose your messages to R mailing lists in plain text.
> >
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel