[Rd] Suggestion: Install packages on non-appendable file systems (e.g. databricks volumes)
Sergio Oller
@erg|o||er @end|ng |rom gm@||@com
Wed Mar 26 17:47:26 CET 2025
Hello,
I would like to submit a patch to R. Following 5 Submitting Feature
Requests – R Development Guide
<https://contributor.r-project.org/rdevguide/chapters/submitting_feature_requests.html>,
I would like to ask for feedback before proceeding with a ¿formal?
submission on bugzilla. It's my first attempt contributing to R and I do
not currently have a bugzilla account.
I am working at a company, and we use R with databricks. We want to install
some packages on a distributed filesystem that is not fully POSIX
compliant, as it does not support opening files in append mode. In C terms,
`open(filename, "a")` gives an error. I guess other distributed file
systems beyond the ones in databricks may have issues with append mode as
well.
Our current workaround is to install all packages on a local folder, and
then copy/move the folder to the distributed file system.
If I understand package installation correctly, when a package is
installed, the installation happens inside a 00LOCK directory, and then the
outcome is moved to the final destination.
The contribution I would like to submit allows users/sysadmins to set an
environment variable named PKG_LOCKDIR_PREFIX, that defines the location
where the "00LOCK-" directories are created. The patch is backwards
compatible and it consists of +28,-10 lines, hopefully easy enough to
review.
https://github.com/r-devel/r-svn/pull/196.diff
When I use this patch, I can successfully install packages on a distributed
file system by setting PKG_LOCKDIR_PREFIX to a directory in my local
filesystem (R does all the file append stuff in the local file system, and
finally copies all the package files to the distributed file system)
This setting makes package installation transparent for all data
scientists, since they may not even know that PKG_LOCKDIR_PREFIX has been
set. Package installation just works as expected.
I feel the patch has some added value over our workaround: Even if we
implement the workaround with a simple wrapper over install.packages(), any
third party package that depends on install.packages() (such as renv or
others) won't use our workaround. Besides, with this patch merged any other
R user benefits from being able to install packages in those filesystems.
Any feedback is very much appreciated.
Thanks for your time,
Sergio
[[alternative HTML version deleted]]
More information about the R-devel
mailing list