[R] How to implement "zero-overhead" code re-use (a la Python, Perl, etc.) in R?

Kynn Jones kynnjo at gmail.com
Fri Sep 30 16:36:12 CEST 2016


I'm collaborating in a long-running research project that, over the
years, has accummulated source code (written in-house) in several
languages: Python, Perl, Mathematica, MATLAB.

Recently I have started writing source code in R for this project, and
I am having trouble incorporating it into our established work flow.

Our code falls into two broad categories: "scripts" (invoked directly
by the user) and "libraries" (invoked by "client code", i.e. scripts
or other library code).

The library code lives under the top-level subdirectory ./lib of the
git-controlled project directory.  (We keep all our code, along with
the rest of the project's documents, under git control.)

Users of client programs of the code under ./lib are expected to
supply (usually via some global configuration) the appropriate library
path (.e.g PYTHONPATH=$PROJECTDIR/lib/python,
MATLABPATH=$PROJECTDIR/lib/MATLAB, etc.).

With this arrangement, code re-use is extremely simple.  For example,
by just dropping into the ./lib/python directory the file foo.py, with
content

    # foo.py
    def bar():
        # etc.

...its bar function becomes *immediately* available, in a
namespace-safe way, to any other python code in the project, like
this:

    # somescript.py

    import foo

    foo.bar()

I describe this form of code re-use as "zero-overhead", since it
requires only the presence of files that actually hold the code.

Under such a code re-use scheme, updates of library code from the
project's git repo are no different from updates of the project's
content in general.  All that is required is running a command like

    git pull origin master

After such a command, the updated library code becomes immediately
available to client code.

Although I used Python for the example above, the picture is very
similar for the other languages we have been using up to now.

For R, however, the situation is different.  The only form of code
re-use I have found for R is through packages.  AFAICT, R packages are
not "zero-overhead": they entail a host of "meta" and derived files
(in addition to the source code files), together with
build/installation steps after each update.

I'm looking for an alternative to packages for code re-use in R, one
that better approximates the "zero-overhead" code re-use model
described earlier.

The only thing that comes to mind is as follows:

  1. a "module" is an *.R file in the directory specified a suitable
environment variable (e.g. PROJECT_R_LIB), and defining a single
"module object", which is simply a named list.  For example,

    # module foo.R

    foo <- list(
      bar = function (...) ... ,
      baz = function (...) ... ,
      frobozz = function (...) ... ,
      ...
      opts = list(...),
      ...
    )

  2. every *.R file starts with boilerplate in the spirit of the
following (along with adequate error checking/messages, etc.):

    # somescript.R

    import <- function (module_name) {
      path_to_lib <- Sys.getenv("PROJECT_R_LIB")
      path_to_module <- file.path(path_to_lib, paste0(module_name, ".R"))
      source(path_to_module)
    }

    import("foo")
    ...
    import("whatever")

    foo$bar(...)
    if (foo$opts$frobnicate) foo$frobozz(...)


This implementation is very crude (I have very little experience with
R), but I hope it at least conveys clearly what I'm after.

I would appreciate any suggestions/comments on how to implement in R
the "zero-overhead" code re-use model I described earlier.



More information about the R-help mailing list