[BioC] Installation on a cluster

Sean Davis sdavis2 at mail.nih.gov
Fri Apr 18 01:43:07 CEST 2008


On Thu, Apr 17, 2008 at 12:56 PM, Claudio Lottaz
<Claudio.Lottaz at klinik.uni-regensburg.de> wrote:
> Hi folks,
>
>  Sean's suggestion to install in a cluster is indeed easy to maintain. We did it similarly but encountered network traffic issues. If you start 50 R-processes at the same time, opening plenty of shared libraries and loading data seemed to bring the network down. Did anybody observe this kind of problems as well? wouldn't it be advisable to distribute the distribution locally on all nodes after installing it in the common NFS-place?
>

I agree that I/O can be an issue.  There are file systems that are
specifically designed with some of these issues in mind (see AFS, as
an example).  If your are using a lot of small R processes that run
for a second or less on a large cluster, reading shared libraries and
things might be an issue.  However, remember that linux caches files,
so these may not be loaded from disk more than once if they can remain
in cache on the nodes.  This, again, will depend on the use cases.
Also, if you start 50 processes and they then run for 24 hours each,
it is not an issue.

The other issue is a larger one.  If your R processes are all
accessing large quantities of data from a shared disk, then there very
well may be issues.  However, this is a harder one to solve on the
cluster and may require some work on the file server.  If R is doing
writing of temporary files, etc., that should be done on local nodes
as best as possible.

In short, I think Claudio brings up a good point that a
one-size-fits-all approach to the problem is naive.  It is worthwhile
learning what bottlenecks your installation and institution might face
and go from there.  Although I do not use them, I think there are
cluster solutions that will allow you to "push" an image of the OS to
the nodes in an automated fashion, but I can't imagine those can be
used in a "live" cluster without some care (bringing down a few nodes
at a time, for example).  Someone else with more experience and
knowledge will need to comment on the more complex solutions.

Sean

>  -----Original Message-----
>  From: "Sean Davis" [mailto:sdavis2 at mail.nih.gov]
>  Sent: Thursday, April 17, 2008 3:40 PM
>  To: "Daniel Davidson" <danield at igb.uiuc.edu>
>  Cc: <bioconductor at stat.math.ethz.ch>
>  Subject: Re: [BioC] Installation on a cluster
>
>  On Thu, Apr 17, 2008 at 9:16 AM, Daniel Davidson <danield at igb.uiuc.edu> wrote:
>  > Hello,
>  >
>  >  I have been tasked with getting Bioconductor installed on our cluster.
>  >  Because the slave nodes cannot access the Internet, the normal method
>  > of  install using:
>  >
>  >  source("http://bioconductor.org/biocLite.R")
>  >  biocinstallPkgGroups("lite")
>  >
>  >  will not work.  Does anyone have a good method of doing this on a cluster?  We have a local Bioconductor mirror on the cluster that is shared of NFS.
>  >
>
>  Hi, Dan.
>
>  The way we do this is to make an nfs-shared /usr/local and install R there.  Then, use biocLite to install packages to the shared
>  directory.    The benefit of this setup is that you update in only one
>  place and only once either packages or R itself and it is automatically seen on all machines.  An added benefit is that additional packages (graphviz, netcdf, etc.) need only be installed into the shared /usr/local tree and all nodes will see them.  Of course, this assumes that your nodes are all one architecture, but since you said "cluster", I assume that is the case
>
>  Sean
>
>  _______________________________________________
>  Bioconductor mailing list
>  Bioconductor at stat.math.ethz.ch
>  https://stat.ethz.ch/mailman/listinfo/bioconductor
>  Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>  _______________________________________________
>  Bioconductor mailing list
>  Bioconductor at stat.math.ethz.ch
>  https://stat.ethz.ch/mailman/listinfo/bioconductor
>  Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list