[R-pkg-devel] Question on best approach to develop R package that wraps old complex Python2 software

Stefan McKinnon Høj-Edwards @me @end|ng |rom |y@|k@com
Tue Jan 25 17:45:02 CET 2022


If the Python2 package is mainly system() calls, I would write an R package
that essentially did the same, without relaying calls via the Python
routines. I.e. let R call the commands directly.

The only downside to this approach, is that R doesn't handle multithreading
as well as Python does. In Python, you have the subcommand (subprocess?)
module, I believe, with which you can call an external command and send it
input, check its output, send new input, or just leave it be, until you
want to check in on your subcommand. But to my knowledge, no similar method
exists in R.
Which brings us to the point: for how long do these external commands run?
Regardless of module used in Python or directly called from R, the R
process will wait. Mostly an issue for interactive uses.

Another approach could be to have your R package handle data formatting,
setup settings etc. and compiling a command with arguments, that the user
may call at their leisure, whether on their laptop, cloud or HPC. When
results have been factualised, they can return to your package to analyse
the results.
I used this approach for my badly named R package Siccuracy, for aiding
with the imputation software AlphaImpute.

Kindly,
Stefan

tir. 25. jan. 2022 16.27 skrev Andrew Simmons <akwsimmo using gmail.com>:

> I would suggest the reticulate library in R. The few most important for
> your case are reticulate::use_python_version and reticulate::import.
> For example, in your R package, you should start with:
>
>
> # change this to the name of the module you need
> numpy <- NULL
>
>
> .onLoad <- function (libname, pkgname)
> {
>     reticulate::use_python_version("2.7")  # change this as you need to
>
>
>     # .onLoad happens before the namespace is locked, so this is legitimate
>     numpy <<- reticulate::import("numpy", delay_load = list(
>         on_error = function(c) stop(
>             "unable to import 'numpy', try ",
>             sQuote("reticulate::py_install(\"numpy\")"),
>             " if it is not installed:\n  ",
>             conditionMessage(c)
>         )
>     ))
> }
>
>
> when your package's namespace is loaded, this will load the version of
> python you need to use, and will lazy-import the module you need for your
> python session.
>
> On Tue, Jan 25, 2022 at 8:52 AM Alexandru Voda <
> alexandru.voda using seh.ox.ac.uk>
> wrote:
>
> > Hi!
> >
> > How would one best write an R wrapper package over a complex Python2
> > software (such as https://github.com/bulik/ldsc), that is still very
> > widely used in statistical genetics?
> >
> > I'm writting an R package (that currently passes all --as-cran checks)
> for
> > multiple other C++ softwares on the same topic as the one above, but this
> > Python2 one I've difficulties with - it just looks like a bunch of
> hackish
> > system() calls... And while it works on Linux and Mac, I've no idea
> whether
> > it'd work on Windows.
> >
> > While it may seem easy to dismiss, actually LDSC is widely used in the
> > statistical genetics field, and lots of people find it difficult to work
> > with because of all the dependency files and weirdly documented commands,
> > and because... well... Python2...
> >
> > Any tips? Or do you know anyone that I should contact/ask?
> >
> > Best wishes,
> > Alexandru
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list