[Bioc-devel] Issue with GenomeInfoDb

Weiss, Kenneth kgweiss at med.umich.edu
Mon Jul 17 17:32:53 CEST 2017


I provide support for an HPC cluster at the University of Michigan. Recently we had a user approach us with the following problem:

I have an issue about using the following package/function in R.

*GenomeInfoDb::Seqinfo(genome = "mm10")*

The function is getting a remote resource that works on the head node, but
not the worker nodes. The error on worker nodes is as follows.
*Error in .make_assembly_report_URL(assembly_accession)*

We have our cluster setup to use proxy variables to be able to access the internet from our compute nodes. After some digging, my colleague, Mark Montague, was able to come up with the following work around:

This is not a security issue. The problem is that our cluster compute nodes are not connected to the Internet.
Because of this, compute nodes need to use a proxy server in order to access things that are off campus. Usually this
is not a problem, but in this particular case there are actually three problems related to GenomeInfoDb not working
correctly with the proxy when running on compute nodes:

1. GenomeInfoDb doesn't recognize the FTP directory listing format our proxy server uses.
2. GenomeInfoDb is not handing the download of assembly reports in the same way it handles other downloads, which is a
problem when the proxy is being used.
3. R is not always using the proxy for FTP connections; this appears to be a quirk of R.

To fix problems 1 and 2, you can install a modified version of GenomeInfoDb to replace the one you currently have
installed in your home directory by running the following commands:

cd ~
git clone https://github.com/Bioconductor-mirror/GenomeInfoDb.git
cd GenomeInfoDb
git checkout release-3.4
patch -p 1 < /home/someuser/GenomeInfoDb.patch

To fix problem 3, you need to tell R to use libcurl to download files rather than its default method. This needs to be
done in every R script that uses GenomeInfoDb. For example:

GenomeInfoDb::Seqinfo(genome = "mm10")

Alternatively, you can put the "options" command above into a file named /home/someuser/.Rprofile and it will take effect
for all R scripts you run.

Just thought you would like to know.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 

	[[alternative HTML version deleted]]

More information about the Bioc-devel mailing list