[BioC] biomaRt queries: optimal size?
James W. MacDonald
jmacdon at med.umich.edu
Mon Dec 21 15:36:20 CET 2009
Hi Jose,
J.delasHeras at ed.ac.uk wrote:
>
> I've recently started to use biomaRt seriously. In teh past I just did a
> few tens of searches and all works fine. Now I have several datasets of
> several thousand IDs each.
>
> I imagine that sending a single search with 3000 ids might not be a good
> idea. I tried, and it broke after a while... and got no results.
A query of 3000 ids is no problem for biomaRt - you should be able to do
a much larger query than that without any troubles.
It would be helpful if you tried your query again and if it fails, send
the results of a traceback().
>
> So I turned to divide the ids in blocks of 200, and proceeded to send my
> queries that way, 200 ids at a time, saving results as I go along.
This is a bad idea for two reasons. First, as you see below, you can get
transient connection problems that will break your loop. Second,
repeatedly querying online database resources in a tight loop is
commonly considered abuse of the resource, and can get your IP banned
from further queries.
Best,
Jim
>
> This worked very well for my first set of 953 ids. When processing my
> secodn dataset of 1545 ids, the connection broke after 1200.
>
> I obtained this error:
> "Error in value[[3L]](cond) :
> Request to BioMart web service failed. Verify if you are still connected
> to the internet. Alternatively the BioMart web service is temporarily
> down."
>
> I am connected to the internet, and I see no evidence of Biomart being
> down...
>
> Can this somehow be related to the size of my queries? I was trying to
> find what size is ok to send in one block, but I didn't find anything
> definite, only that sending one id at a time in a loop is not a good idea.
>
> Any help greatly appreciated.
>
> Thanks!
>
> Jose
>
> PS: sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252
> [2] LC_CTYPE=English_United Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_2.2.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.2-1 XML_2.6-0
>
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list