[Rd] Runnable R packages
lindelof @ending from ieee@org
Thu Jan 3 11:43:31 CET 2019
I’m working as a data scientist in a major tech company. I have been using
R for almost 20 years now and there’s one issue that’s been bugging me of
late. I apologize in advance if this has been discussed before.
R has traditionally been used for running short scripts or data analysis
notebooks, but there’s recently been a growing interest in developing full
applications in the language. Three examples come to mind:
1) The Shiny web application framework, which facilitates the developent of
rich, interactive web applications
2) The httr package, which provides lower-level facilities than Shiny for
writing web services
3) Batch jobs run by data scientists according to, say, a cron schedule
Compared with other languages, R’s support for such applications is rather
poor. The Rscript program is generally used to run an R script or an
arbitrary R expression, but I feel it suffers from a few problems:
1) It encourages developers of batch jobs to provide their code in a single
R file (bad for code structure and unit-testability)
2) It provides no way to deal with dependencies on other packages
3) It provides no way to "run" an application provided as an R package
For example, let’s say I want to run a Shiny application that I provide as
an R package (to keep the code modular, to benefit from unit tests, and to
declare dependencies properly). I would then need to a) uncompress my R
package, b) somehow, ensure my dependencies are installed, and c) call
runApp(). This can get tedious, fast.
Other languages let the developer package their code in "runnable"
artefacts, and let the developer specify the main entry point. The
mechanics depend on the language but are remarkably similar, and suggest a
way to implement this in R. Through declarations in some file, the
developer can often specify dependencies and declare where the program’s
"main" function resides. Consider Java:
Artefact: .jar file
Declarations file: Manifest file
Entry point: declared as 'Main-Class'
Executed as: java -jar <jarfile>
Artefact: Python package, typically as .tar.gz source distribution file
Declarations file: setup.py (which specifies dependencies)
Entry point: special __main__() function
Executed as: python -m <package>
R has already much of this machinery:
Artefact: R package
Declarations file: DESCRIPTION
Entry point: ?
Executed as: ?
I feel that R could benefit from letting the developer specify, possibly in
DESCRIPTION, how to "run" the package. The package could then be run
through, for example, a new R CMD command, for example:
R CMD RUN <package> <args>
I’m sure there are plenty of wrinkles in this idea that need to be ironed
out, but is this something that has ever been considered, or that is on R’s
Thanks for reading so far,
David Lindelöf, Ph.D.
+41 (0)79 415 66 41 or skype:david.lindelof
Follow me on Twitter:
[[alternative HTML version deleted]]
More information about the R-devel