[BioC] biomaRt queries: optimal size?

James W. MacDonald jmacdon at med.umich.edu
Mon Dec 21 15:36:20 CET 2009


Hi Jose,

J.delasHeras at ed.ac.uk wrote:
> 
> I've recently started to use biomaRt seriously. In teh past I just did a 
> few tens of searches and all works fine. Now I have several datasets of 
> several thousand IDs each.
> 
> I imagine that sending a single search with 3000 ids might not be a good 
> idea. I tried, and it broke after a while... and got no results.

A query of 3000 ids is no problem for biomaRt - you should be able to do 
a much larger query than that without any troubles.

It would be helpful if you tried your query again and if it fails, send 
the results of a traceback().


> 
> So I turned to divide the ids in blocks of 200, and proceeded to send my 
> queries that way, 200 ids at a time, saving results as I go along.

This is a bad idea for two reasons. First, as you see below, you can get 
transient connection problems that will break your loop. Second, 
repeatedly querying online database resources in a tight loop is 
commonly considered abuse of the resource, and can get your IP banned 
from further queries.

Best,

Jim


> 
> This worked very well for my first set of 953 ids. When processing my 
> secodn dataset of 1545 ids, the connection broke after 1200.
> 
> I obtained this error:
> "Error in value[[3L]](cond) :
> Request to BioMart web service failed. Verify if you are still connected 
> to the internet. Alternatively the BioMart web service is temporarily 
> down."
> 
> I am connected to the internet, and I see no evidence of Biomart being 
> down...
> 
> Can this somehow be related to the size of my queries? I was trying to 
> find what size is ok to send in one block, but I didn't find anything 
> definite, only that sending one id at a time in a loop is not a good idea.
> 
> Any help greatly appreciated.
> 
> Thanks!
> 
> Jose
> 
> PS: sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
> 
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252
> [2] LC_CTYPE=English_United Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] biomaRt_2.2.0
> 
> loaded via a namespace (and not attached):
> [1] RCurl_1.2-1 XML_2.6-0
> 
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list