[Rd] [Rocks-Discuss] Two R editiosn in Unix cluster systems

Adam Brenner aebrenne at uci.edu
Wed Oct 16 03:24:50 CEST 2013


For our HPC cluster we have ran into this issue in the past. What we use is
modules[1]. We instructor our users to run a command, like

   modules load R/2.15.2

This will load up the environment path in which R/2.15.2 lives. If they want
to switch or use R/3.0.1 they simply run

   module unload R/2.15.2
   module load R/3.0.1

For all our software install, we do *not* install software on each node. The
overhead for us to create a compilation script and fork that out to each node
within our cluster (100+) is not worth it. Instead we use modules, as I have
described above. We use a standard NFS server with lots of NFS processes that
gets mounted on each compute node. This has worked extremely well for us.

The primary reason, is due to the fact the linux kernel does a fairly good job
when caching libraries. In our setup, we have experienced most, if not all,
the R libraries stay in memory once loaded from our NFS server. The data/input
files R uses is on our Gluster or FraunhoferFS parallel file system. Of
course, keeping the data local to the compute node would be the fastest.

If you still want to install software locally on each compute node, you can
still take advantage of modules. I do suggest you install R (from source or
RPM, etc) in a non-standard location like /opt/ or make your own /apps, /data
and so on. Then create a module file similar to the following:

        module load gcc/4.8.1
        set  ROOT  /data/apps/R/3.0.1

        prepend-path PATH            $ROOT/bin
        prepend-path MANPATH         $ROOT/share
        prepend-path R_LIBS          $ROOT/lib64/R/library
        prepend-path LD_LIBRARY_PATH $ROOT/lib64/R/lib

Replace the TCL variable ROOT with the path of where R lives and you are good
to go. This method of works with other software besides R :-)

[1]: http://modules.sourceforge.net


Adam Brenner
Computer Science, Undergraduate Student
Donald Bren School of Information and Computer Sciences

Research Computing Support
Office of Information Technology

University of California, Irvine
aebrenne at uci.edu

On Tue, Oct 15, 2013 at 1:15 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> Dear R Devel
> Some of our R users are still insisting we run R-2.15.3 because of
> difficulties with a package called OpenMX.  It can't cooperate with new R,
> oh well.
> Other users need to run R-3.0.1. I'm looking for the most direct route to
> install both, and allow users to choose at runtime.
> In the cluster, things run faster if I install RPMs to each node, rather
> than putting R itself on the NFS share (but I could do that if you think
> it's really better....)
> In the past, I've used the SRPM packaging from EPEL repository to make a
> few little path changes and build R RPM for our cluster nodes. Now I face
> the problem of building 2 RPMS, one for R-2.15.3 and one for R-newest, and
> somehow keeping them separate.
> If you were me, how would you approach this?
> Here's my guess
> First, The RPM packages need unique names, of course.
> Second, leave the RPM packaging for R-newest exactly the same as it always
> was.  R is in the path, the R script and references among all the bits will
> be fine, no need to fight. It will find what it needs in /usr/lib64/R or
> whatnot.
> For the legacy R, I'm considering 2 ideas.  I could install R with the same
> prefix, /usr, but very careful so the R bits are installed into separate
> places. I just made a fresh build of R and on RedHat 6, it appears to me R
> installs these directories:
> bin
> libdir
> share.
> So what if the configure line has the magic bindir=/usr/bin-R-2.15.3
> libdir = /usr/lib64/R-2.15.3, and whatnot. If I were doing Debian
> packaging, I suppose I'd be obligated (by the file system standard) to do
> that kind of thing. But it looks like a headache.
> The easy road is to set the prefix at some out of the way place, like
> /opt/R-2.15.3, and then use a post-install script to link
> /opt/R-2/15.3/bin/R to /usr/bin/R-2.15.3.  When I tried that, it surprised
> me because R did not complain about lack access to devel headers. It
> configures and builds fine.
> R is now configured for x86_64-unknown-linux-gnu
>   Source directory:          .
>   Installation directory:    /tmp/R
>   C compiler:                gcc -std=gnu99  -g -O2
>   Fortran 77 compiler:       gfortran  -g -O2
>   C++ compiler:              g++  -g -O2
>   Fortran 90/95 compiler:    gfortran -g -O2
>   Obj-C compiler:            gcc -g -O2 -fobjc-exceptions
>   Interfaces supported:      X11, tcltk
>   External libraries:        readline, ICU, lzma
>   Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
>   Options enabled:           shared BLAS, R profiling, Java
>   Recommended packages:      yes
> Should I worry about any runtime complications of this older R finding its
> of the newer R in the PATH ahead of it? I worry I'm making lazy
> assumptions?
> After that, I need to do some dancing with the RPM packaging.
> I suppose there'd be some comfort if I could get the users to define R_HOME
> in their user environment before launching jobs, I think that would
> eliminate the danger of confusion between versions, wouldn't it?
> pj
> --
> Paul E. Johnson
> Professor, Political Science      Assoc. Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org               http://quant.ku.edu
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20131015/a7f0e1e9/attachment.html

More information about the R-devel mailing list