[R-pkg-devel] active bindings in package namespace

Jack O. Wasey j@ck @end|ng |rom j@ckw@@ey@com
Sun Mar 24 18:27:43 CET 2019


This is a good point. I would prefer to include all the data in the 
package, but CRAN has strict limitations on package and subdirectory 
size, which the potential data would easily exceed. Whether it is an 
active binding or a get function, dynamically downloaded data will 
always suffer this problem. Also, there are potential copyright issues 
which may prevent including all the relevant data in a package, no 
matter how the package is distributed.

For this particular package of ICD data, the biggest risk is not the 
data changing, but the data not being made available in the future, or 
not being provided in a useful format.

I do allow the user to set the cache directory, which eventually 
includes all the raw and processed data, and this could be archived by 
the user for reproducibilty. In addition, the test suite covers 
potential changes to the source data.

On 3/24/19 11:21 AM, Hong Ooi wrote:
> Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As I understand it, accessing the symbol "foo" will pull the latest version of foo from the remote site. This has consequences for reproducibility, because now your code could be exactly the same, and your local environment exactly the same, and yet running the code at different times can yield different results because the remote data has been updated.
> 
> 
> -----Original Message-----
> From: R-package-devel <r-package-devel-bounces using r-project.org> On Behalf Of Jack Wasey
> Sent: Sunday, 24 March 2019 9:57 AM
> To: Kirill Müller <krlmlr+ml using mailbox.org>; R Development <r-package-devel using r-project.org>
> Subject: Re: [R-pkg-devel] active bindings in package namespace
> 
> Thanks both, this is helpful advice.
> 
> On 3/23/19 5:14 PM, Kirill Müller wrote:
>> Dear Jack
>>
>>
>> This doesn't answer your question, but I would advise against this design.
>>
>> - Users do not expect side effects (such as network access) from accessing a symbol.
>>
>> - A function gives you much more flexibility to change the interface
>> later on. (Arguments for fetching the data, tokens for API access,
>> ...)
>>
>> - You already encountered a few quirks that make this an "interesting" problem.
>>
>> A function call only needs a pair of parentheses.
>>
>>
>> Best regards
>>
>> Kirill
>>
>>
>> On 23.03.19 16:50, Jack O. Wasey wrote:
>>> Dear all,
>>>
>>> I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression.
>>>
>>> Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R.
>>>
>>> The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't).
>>>
>>> The bindings are set using commands alongside the function definitions in R/*.R .E.g.
>>>
>>> makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding,
>>> environment()) lockBinding("icd10cm_latest", environment())
>>>
>>> For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data.
>>>
>>> This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition.
>>>   - checking dependencies in R code
>>>   - checking S3 generic/method consistency
>>>   - checking foreign function calls
>>>   - checking R code for possible problems
>>>
>>> Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check.
>>>
>>> There may be legitimate reasons to evaluate everything in the
>>> namespace, but I've no idea what they are. Incidentally, Rstudio also
>>> does 'mget' on the whole package namespace and triggers bindings
>>> during autocomplete. https://github.com/rstudio/rstudio/issues/4414
>>>
>>> Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace?
>>>
>>> Any ideas appreciated.
>>>
>>> Jack Wasey
>>>
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 
> ______________________________________________
> R-package-devel using r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
>



More information about the R-package-devel mailing list