[Bioc-devel] biomaRt cannot list marts when going through a mirror web site

Nicolas Delhomme delhomme at embl.de
Wed May 2 10:52:19 CEST 2012

Hi Steffen, hi Wolfgang,

When trying to list the marts available from an ensembl mirror, I get the following:

Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing

This is triggered by this line: 

registry = bmRequest(request = request, ssl.verifypeer = ssl.verifypeer, verbose = verbose)

in the listMarts function. 

Looking at the bmRequest function, it uses the getURL function of the RCurl package. This function is the culprit:

## the request as computed by listMarts
request = "http://uswest.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt"
getURL(request, ssl.verifypeer = TRUE) 
[1] "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>302 Found</title>\n</head><body>\n<h1>Found</h1>\n<p>The document has moved <a href=\"http://www.ensembl.org/biomart/martservice?type=registry&requestid=biomaRt;redirect=mirror;source=uswest.ensembl.org\">here</a>.</p>\n</body></html>\n"

As you  an see it returns a 302 relocation page, i.e. the website is mirrored to "www.ensembl.org" in my case.

Adding a followlocation=TRUE argument to that command solves the problem:

getURL(request, ssl.verifypeer = TRUE, followlocation=TRUE)
[1] "\n<MartRegistry>\n  <MartURLLocation database=\"ensembl_mart_66\" default=\"1\" displayName=\"Ensembl Genes 66\" host=\"www.ensembl.org\" includeDatasets=\"\" martUser=\"\" name=\"ENSEMBL_MART_ENSEMBL\" path=\"/biomart/martservice\" port=\"80\" serverVirtualSchema=\"default\" visible=\"1\" />\n  <MartURLLocation database=\"sequence_mart_66\" default=\"\" displayName=\"Sequence\" host=\"www.ensembl.org\" includeDatasets=\"\" martUser=\"\" name=\"ENSEMBL_MART_SEQUENCE\" path=\"/biomart/martservice\" port=\"80\" serverVirtualSchema=\"default\" visible=\"\" />\n  <MartURLLocation database=\"ontology_mart_66\"

... truncated

Can you please add that additional parameter to the getURL call?

Steffen, if you're in a location that uses "uswest.ensembl.org" as a mirror (i.e. US west coast I guess, Seattle for sure :-)), you can use "www.ensembl.org" as a host to reproduce that error instead.

I have only tumbled onto that one, but there might be more occurrence in the code that need adapting. I'll try to figure that out.

My session info (R 2.15.0 with useDevel(TRUE)):

R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] customCDF_0.99.3      XML_3.9-4             RSQLite_0.11.1       
 [4] RCurl_1.91-1          bitops_1.0-4.1        Rsamtools_1.9.8      
 [7] GEOquery_2.23.1       GenomicFeatures_1.9.6 GenomicRanges_1.9.9  
[10] DBI_0.2-5             Biostrings_2.25.3     IRanges_1.15.7       
[13] AnnotationDbi_1.19.4  vsn_3.25.0            makecdfenv_1.35.0    
[16] gcrma_2.29.0          BiocInstaller_1.5.7   biomaRt_2.13.0       
[19] affy_1.35.1           Biobase_2.17.5        BiocGenerics_0.3.0   

loaded via a namespace (and not attached):
 [1] affyio_1.25.0         BSgenome_1.25.1       grid_2.15.0          
 [4] lattice_0.20-6        limma_3.13.1          preprocessCore_1.19.0
 [7] rtracklayer_1.17.0    splines_2.15.0        stats4_2.15.0        
[10] tools_2.15.0          zlibbioc_1.3.0



Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany

More information about the Bioc-devel mailing list