[Rd] Runnable R packages

Thu Jan 31 16:26:00 CET 2019

On 31/01/2019 9:32 a.m., David Lindelof wrote:
> Belated thanks to all who replied to my initial query. In summary, three
> approaches have been mentioned to run R code "in production": 1)
> ShinyProxy, mentioned by Tobias, for deploying Shiny applications; 2)
> Docker-like solutions, mentioned by Gergely and Iñaki; and 3) Solutions
> based on Rscript or littler, mentioned by Dirk.
> 
> I can't speak to 1) because I don't currently use Shiny. And it seems to me
> that Docker-like solutions will still need some "point of entry" for the R
> application, which will have to be Rscript or littler.
> 
> In my first email, I observed that Rscript expects a single expression or a
> single script, which is probably why (in my experience) many data
> scientists tend to provide their code in a very limited number of files.
> Gergely disagreed, arguing to the contrary that data scientists are
> encouraged to provide their application as an R package called by a short
> script executed by Rscript. But this doesn't happen where I work for
> several reasons:
> 
>     - it implies installing your package on the production machine(s),
>     including its dependencies, which must be done by hand
>     - some machine learning platforms will simply not accept code provided
>     as an R package
>     - we have some "big data" use cases for which we need Spark; Spark can
>     run R or Python code, but only when it is provided as a single file. (On
>     the other hand, Spark can run applications provided as JAR files)
> 
> In summary, I'm convinced R would benefit from something similar to Java's
> `Main-Class` header or Python's `__main__()` function. A new R CMD command
> would take a package, install its dependencies, and run its "main"
> function. If we have this machinery available, we could even consider
> reaching out to Spark (and other tech stacks) developers and make it easier
> to develop R applications for those platforms.
> 
> A candid comment from Dirk suggested that I should implement this myself,
> which I would be happy to do, provided this is the normal procedure. Or is
> there a more formal process I should follow?

You can't implement it to run under R CMD, but it should be 
straightforward to put this in an R package, to be run by Rscript using 
something like

   Rscript -e "yourpackage::run_main('somepackage')"

You can use the installation code from the `remotes` package, so 
run_main() could be a pretty simple function.

Duncan Murdoch