[R-pkg-devel] Submission to CRAN when package needs personal data (API key)

Duncan Murdoch
Fri Sep 7 19:10:18 CEST 2018

On 07/09/2018 3:09 AM, Rainer Krug wrote:
On 7 Sep 2018, at 02:16, Duncan Murdoch wrote: 
>> <mailto:murdoch.duncan using gmail.com>> wrote:
On 06/09/2018 10:32 AM, Hadley Wickham wrote:
On Wed, Sep 5, 2018 at 3:03 PM Duncan Murdoch wrote: 
>>> <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>> wrote:
On 05/09/2018 2:20 PM, Henrik Bengtsson wrote:
>>>>> I take a complementary approach; I condition on, my home-made,
>>>>> R_TEST_ALL variable.  Effectively, I do:
>>>>> if (as.logical(Sys.getenv("R_TEST_ALL", "FALSE"))) {
>>>>>     ...
>>>>> }
>>>>> and set R_TEST_ALL=TRUE when I want to run that part of the test.  You
>>>>> can also imagine refined versions of this, e.g. R_TEST_SETS=foo,bar
>>>>> and test scripts with:
>>>>> if ("foo" %in% strsplit(Sys.getenv("R_TEST_SETS"), split="[, 
>>>>> ]+")[[1]]) {
>>>>>     ...makes no assumption
>>>>> }
>>>>> That avoids making assumptions on where the tests are submitted/run,
>>>>> may it be CRAN, Bioconductor, Travis CI, ...
>>>> This is the right way to do it.
>>> I would like to gently push back on this assertion: if CRAN set an
>>> environment variable we would have one single convention that all
>>> packages could rely on.
>> When packages delete tests just for CRAN, the quality of the 
>> repository suffers. 
> Absolutely. But in some cases. But t the moment, one is forced to use 
> workarounds if test **can** not be run on CRAN (API keys, computing 
> times, …) but should be run on local tests. It would make much more 
> sense if there would be a standardised way of dealing with this.
>> Users should be able to check an install by running the tests that 
>> passed on CRAN and seeing them pass on their system as well.
> Also agreed - so if the user sets the environmental variable CRAN for 
> the test, the CRAN tests are executed (as today), if not set, the 
> extended tests are executed.
>> The current system relies on each package
>>> author evolving their own solution. This makes life difficult when you
>>> are running local reverse dependency checks: there is no way to
>>> systematically assert that you want to run tests in a way as similar
>>> as possible to CRAN.
>> Most packages don't need to evolve anything:  the CRAN tests are 
>> sufficient.
> But there seems to be a need to exclude certain tests, due to various 
> reasons.

That need doesn't just apply to CRAN, it applies to anyone running them 
who doesn't have an API key.  So why not leave those tests out by 
default, with a documented way to enable them?

>>> I know that the CRAN maintainers already have a very large workload,
>>> and I hate to add to it, but setting CRAN=1 in a few profile files
>>> doesn't seem excessively burdensome.
>> It would be easy to do that, but then CRAN wouldn't be testing the 
>> same things that users would test. 
> See my comment above.
>> A user might see a test failure that didn't happen on CRAN, and 
>> suspect that there was something wrong with their install, when in 
>> fact it was an author trying to hide a deficiency in their package 
>> from CRAN.
> Only if they execute the extended tests. I can still hide deficiencies 
> in my package by not applying a specific test or doctoring the result, 
> if that is my intention. But the extended tests could be used to test 
> additional setup options, which can not be tested on CRAN.
>>>> This discussion has come up before.  If you want to submit to CRAN, you
>>>> should include tests that satisfy their requests.  If you want even more
>>>> tests, there are several ways to add them in addition to the CRAN tests.
>>>>   Henrik's is one, "R CMD check --test-dir=myCustomTests" is another.
>>>> Rainer's package is unusual, in that from his description it can't
>>>> really work unless the user obtains an API key.  There are other
>>>> packages like that, and those cases need manual handling by CRAN:  they
>>>> don't really run full tests by default.  But the vast majority of
>>>> packages should be able to live within the CRAN guidelines.
>>> 10 years ago, I would have definitely supported this statement. But I
>>> am not sure it is still correct today, as there are now many packages
>>> that require a connection to web API to work (or depend on a package
>>> that uses an API). Additionally, CRAN only allows a limited amount of
>>> compute time for each check, so there are often longer tests that you
>>> want to run locally but not on CRAN. CRAN is a specialised testing
>>> service and it does have different constraints to your local machine,
>>> travis, and bioconductor.
>>> A quick search of the CRAN mirror on github
>>> (https://github.com/search?q=org%3Acran+skip_on_cran&type=Code)
>>> reveals that there are ~2700 tests that use testthat::skip_on_cran().
>>> This is obviously an underestimate of the total number of tests
>>> skipped on CRAN, as many packages don't use testthat, or use an
>>> alternative technique to avoid running code on CRAN.
>> That's not so obviously an underestimate, as packages that use that 
>> technique use it many times, not just once per package.  (A sample I 
>> looked at averaged 15 calls per package, but I don't know if that's 
>> unbiased.)
>> But in any case, the skip_on_cran() function implements a version of 
>> Henrik's approach.  The name of the function is misleading, it doesn't 
>> attempt to distinguish between CRAN and a regular user.
> I would guess because it can’t. If there would be a standardised way of 
> identifying that the test is run on CRAN, I would use this immediately.

Then your package would fail when I ran the tests, because I don't have 
an API key, and I am not CRAN.  It makes more sense to me to treat CRAN 
the same as any other user who is not the author.

Duncan Murdoch

> Cheers,
> Rainer
>> Duncan Murdoch
