[R-pkg-devel] Examples are too long in computation for CRAN
Michael Topper
m|ketopper123 @end|ng |rom gm@||@com
Sun Aug 13 09:59:40 CEST 2023
I have tried the following:
- Trimming down the examples substantially to only run 1 regression
per-function.
- Setting the nthreads argument to 2 in fixest::feels() in case this is
the problem as suggested.
- Tried to use skip_cran_test() on the tests that include fixest
regressions
However, while the time has substantially been trimmed down, it still does
not pass. At this point, I'm not sure what the next step is.
Below is the results:
Flavor: r-devel-linux-x86_64-debian-gcc
Check: examples, Result: NOTE
Examples with CPU time > 2.5 times elapsed time
user system elapsed ratio
panelsummary_raw 3.354 0.054 0.461 7.393
clean_raw 3.436 0.091 0.571 6.177
panelsummary 3.636 0.136 0.824 4.578
Flavor: r-devel-linux-x86_64-debian-gcc
Check: tests, Result: NOTE
Running 'testthat.R' [39s/4s]
Running R code in 'testthat.R' had CPU time 8.7 times elapsed time
On Sat, Aug 12, 2023 at 11:26 PM Uwe Ligges <ligges using statistik.tu-dortmund.de>
wrote:
>
>
> On 13.08.2023 08:14, Ivan Krylov wrote:
> > В Sat, 12 Aug 2023 22:49:01 -0700
> > Michael Topper <miketopper123 using gmail.com> пишет:
> >
> >> It appears that some of my examples/tests are taking too
> >> long to run for CRAN's standards.
> >
> > I don't think they are running too long; I think they are too parallel.
> > The elapsed time is below 1s, but the "user" time (CPU time spent in
> > the process) is 7 to 13 times that. This suggests that your code
> > resulted in starting more threads than CRAN allows (up to 2 if you have
> > to test parallellism). Are you using OpenMP? data.table? makeCluster()?
> > It's simplest to always to default to a parallelism factor of 2 in
> > examples an tests, because determining the right number is a hard
> > problem. (What if the computer is busy doing something else? What if
> > the BLAS is already parallel enough?)
> >
> >> Moreover, is there any insight as to why this would happen on the
> >> third update of the package rather than on the first or second?
> >
> > The rule has always depended on the particular system running the
> > checks (five seconds on my 12-year-old ThinkPad or on my ultraportative
> > with an Intel Atom that had snails in its ancestry?). Maybe some
> > dependency of your package has updated and started creating threads
> > where it previously didn't.
> >
>
>
> Good points, not only for examples and tests, but also for defaults.
>
> On shared resources (such as clusters) users may not expect the
> parallelization you use and then overallocate the resources.
>
> Example: 20 cores available to the user who runs makeCluster() for
> paallelization, but the underlying code does multihreading on 20 cores.
> Then we end up in 20*20 threads on the machine slowing down the machine
> and processes of other uers.
> Hence, defaults should also not be more than 2. Simply allow the user to
> ask for more.
>
> Best,
> Uwe Ligges
>
--
Michael Topper
B.S. Economics and Mathematics, University of California San Diego 2015
M.A. Economics, San Diego State University 2018
Mobile: (805) 914-4285
miketopper123 using gmail.com
[[alternative HTML version deleted]]
More information about the R-package-devel
mailing list