[R] valid package repositories

Henrik Bengtsson henrik.bengtsson at gmail.com
Mon Oct 2 21:00:37 CEST 2017

Here's my view on this:

CRAN = Comprehensive R Archive Network.  The "Archive" part is very
important - it "promises" the research community that R packages that
have ever been published on CRAN, and all the versions of each
package, will be available also in the future.  It requires quite a
bit for a package/code to disappear from CRAN, e.g. a package contains
code/data that is not allowed to be shared (due to licenses and
copyrights).  Not even the original developer/maintainer can remove a
package that has already been released on CRAN.  What we see at times,
a package is "archived" on CRAN (i.e. no longer available via
install.packages()), but the old package versions are still
distributed.  That CRAN protects us this way is extremely valuable to
the research community, open science, and reproducible research.  The
Bioconductor has a similar philosophy.

However convenient GitHub / GitLab / ... is for development etc, it
certainly does not provide scientific archiving - in that sense it is
no different than sharing packages on Dropbox, Google Drive, etc.


On Mon, Oct 2, 2017 at 10:25 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> I tend to regard GitHub as a bit of wild west... anyone can upload anything there, working or not. CRAN packages at least have to compile so there is some additional verification in being there.
> GitHub does have the advantage that you can easily download it and run an example if the authors have set up such scaffolding... which is better than "it ran once on that laptop that died". However, there is a distinct extra level of sophistication involved in getting researchers to make those examples or test cases beyond their mainline code, and nothing about GitHub requires that such features be present in uploaded code.
> --
> Sent from my phone. Please excuse my brevity.
> On October 2, 2017 7:47:35 AM PDT, Federico Calboli <federico.calboli at kuleuven.be> wrote:
>>Hi All,
>>I noticed that it is quite common to find in papers mentions to ‘R
>>libraries’ developed for the algorithms/models/code/whatever that is
>>being described by the paper, so that third parties will be able to use
>>said method for themselves.  On further enquiries these libraries are
>>not actually available on CRAN, but need to be requested from the devs.
>>That is in itself does not seem a big issue, were it not for the fact
>>most of the time I am in such situation the code is very specific for
>>the environment of the developer, and does not actually work on any
>>machine I try to run it on (something that is painfully true for code
>>calling C/C++/Fortran).  A second pattern I seem to have noticed is
>>that, despite said libraries being advertised for general use in a
>>*published* paper, when I raise the issue the library is not actually
>>formally published and it does not actually work like a CRAN published
>>library would, I get a vague ‘the person who actually did the work left
>>and nobody can maintain the code/fix stuff/finish the job’.
>>As a referee I am trying to weed out what I see as malpractice: the
>>promise that third parties outside the developers might actually use
>>the code because it has been packaged as a R library, a claim that
>>seems to boost publishing chances.
>>Thus my question: when can I consider a library to be properly
>>published and really publicly available?  CRAN and BioConductor are
>>clearly gold standards.  What about Github?  I am currently using the
>>rule ‘not on CRAN == outright rejection’.  If Github is as good as CRAN
>>I will include it on my list of ‘the code is available in a functional
>>state as claimed’.
>>Finally, please note the scope of my query:  I am not looking at those
>>cases where a colleague gives me half finished code that might be
>>useful but I need to sort out.  I am looking at formal claims ‘we have
>>developed a method to do X and said method is available to the public
>>as a R library’.  If that is the claim I expect it to be true.
>>Federico Calboli
>>LBEG - Laboratory of Biodiversity and Evolutionary Genomics
>>Charles Deberiotstraat 32 box 2439
>>3000 Leuven
>>+32 16 32 87 67
>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>PLEASE do read the posting guide
>>and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list