[Rd] Two R editiosn in Unix cluster systems

Steven McKinney smckinney at bccrc.ca
Wed Oct 16 02:33:52 CEST 2013


> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
> On Behalf Of Paul Johnson
> Sent: October-15-13 1:15 PM
> To: R Devel List
> Cc: Discussion of Rocks Clusters
> Subject: [Rd] Two R editiosn in Unix cluster systems
> 
> Dear R Devel
> 
> Some of our R users are still insisting we run R-2.15.3 because of
> difficulties with a package called OpenMX.  It can't cooperate with new R,
> oh well.
> 
> Other users need to run R-3.0.1. I'm looking for the most direct route to
> install both, and allow users to choose at runtime.
> 
> In the cluster, things run faster if I install RPMs to each node, rather
> than putting R itself on the NFS share (but I could do that if you think
> it's really better....)
> 
> In the past, I've used the SRPM packaging from EPEL repository to make a
> few little path changes and build R RPM for our cluster nodes. Now I face
> the problem of building 2 RPMS, one for R-2.15.3 and one for R-newest, and
> somehow keeping them separate.
> 
> If you were me, how would you approach this?

Our bioinformatics group needs multiple versions of R
and other software for a variety of compatibility issues.

We thus gave up on trying to keep multiple versions of
R and related pipeline software on all nodes of our
cluster.

We set up a mount point on each cluster node pointing to
a directory structure on the head node (/share/apps).  

We compile and link all necessary materials in that directory 
structure, so that no executables or shared objects from 
/usr or other local drive locations need be accessed.  
All code is in e.g. /share/apps/R/R-x.yy.z  so all the 
nodes can see all the versions.  We have shared
libraries in e.g. /share/apps/usr/lib

All pipeline scripts use full paths to R and other
executables, and since R is self-contained when 
appropriately  compiled as you note below, there's 
no path clashing.

(We also abandoned NFS for lustre so we don't have the
speed issue you might face with such an arrangement.
But generally code just needs to be read once and
is then kept in memory by current OSs, so you might
not notice much of a speed hit as far as getting the
executable into memory.)

Maintaining one set of code accessible to all nodes has
made things much simpler than trying to set up all the
rpms on the head node so that all compute nodes get
it all installed locally.  

Some attention to detail is important at compile time, 
to ensure that all bits that go into the compilation 
really do come from /share/apps  but that's about it.  
This has been easier to accomplish than maintaining 
a library of rpms on the head node and managing the 
distribution scripts to push out to the compute nodes.




Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

> Here's my guess
> 
> First, The RPM packages need unique names, of course.
> 
> Second, leave the RPM packaging for R-newest exactly the same as it always
> was.  R is in the path, the R script and references among all the bits will
> be fine, no need to fight. It will find what it needs in /usr/lib64/R or
> whatnot.
> 
> For the legacy R, I'm considering 2 ideas.  I could install R with the same
> prefix, /usr, but very careful so the R bits are installed into separate
> places. I just made a fresh build of R and on RedHat 6, it appears to me R
> installs these directories:
> bin
> libdir
> share.
> 
> So what if the configure line has the magic bindir=/usr/bin-R-2.15.3
> libdir = /usr/lib64/R-2.15.3, and whatnot. If I were doing Debian
> packaging, I suppose I'd be obligated (by the file system standard) to do
> that kind of thing. But it looks like a headache.
> 
> The easy road is to set the prefix at some out of the way place, like
> /opt/R-2.15.3, and then use a post-install script to link
> /opt/R-2/15.3/bin/R to /usr/bin/R-2.15.3.  When I tried that, it surprised
> me because R did not complain about lack access to devel headers. It
> configures and builds fine.
> 
> R is now configured for x86_64-unknown-linux-gnu
> 
>   Source directory:          .
>   Installation directory:    /tmp/R
> 
>   C compiler:                gcc -std=gnu99  -g -O2
>   Fortran 77 compiler:       gfortran  -g -O2
> 
>   C++ compiler:              g++  -g -O2
>   Fortran 90/95 compiler:    gfortran -g -O2
>   Obj-C compiler:            gcc -g -O2 -fobjc-exceptions
> 
>   Interfaces supported:      X11, tcltk
>   External libraries:        readline, ICU, lzma
>   Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
>   Options enabled:           shared BLAS, R profiling, Java
> 
>   Recommended packages:      yes
> 
> Should I worry about any runtime complications of this older R finding its
> of the newer R in the PATH ahead of it? I worry I'm making lazy
> assumptions?
> 
> After that, I need to do some dancing with the RPM packaging.
> 
> I suppose there'd be some comfort if I could get the users to define R_HOME
> in their user environment before launching jobs, I think that would
> eliminate the danger of confusion between versions, wouldn't it?
> 
> pj
> --
> Paul E. Johnson
> Professor, Political Science      Assoc. Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org               http://quant.ku.edu
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list