[BioC] BiomaRt error: ncol(result) == length(attributes) is not TRUE
Steffen
sdurinck at lbl.gov
Thu Mar 27 19:32:34 CET 2008
Hi Quin,
It can deal with a long vector of identifiers e,g, 30000 ids should work
in one query and should be fast.
Cheers,
Steffen
Quin Wills wrote:
> Thank you Steffen for the really quick reply.
>
> Out of interest, I tried using Sys.sleep() but I still get the same
> problem. www.ensembl.org is also running quite slowly this side for
> queries, and I wonder if that might be somehow related.
>
> Sorry if this is in the manual - I didn't spot it. How many queries in
> a batch do you think one should avoid going over? I've a fairly long
> list of identifiers.
>
> Quin
>
>
> Steffen wrote:
>> Hi Quin,
>>
>> How long is your list of identifiers? It is not recommended to run a
>> query like this in loops as this causes the web service to go out of
>> sync at some point during the loop.
>> biomaRt is made to perform batch queries.
>> I would recommend to do your query as follows:
>>
>> genes <- getGene(id=ID, type="refseq_dna", mart=ensembl)
>>
>> This will give you a dataframe with the info for all the genes. If
>> needed you can then loop over the result.
>> If you feel like you really need to loop you could add Sys.sleep(1)
>> in the loop.
>>
>> Cheers,
>> Steffen
>>
>> Quin Wills wrote:
>>> Hello all
>>>
>>> I'm running the most up to date R and biomaRt.
>>>
>>> I get the following error:
>>> >Error: ncol(result) == length(attributes) is not TRUE
>>>
>>> for the following loop:
>>> # 'ID' is a character vector of refseq IDs
>>> #'gene', for the purposes of the argument here, is a list storing
>>> the output
>>>
>>> > ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
>>> > for (i in 1:length(ID)) {
>>> > gene[[i]] <- getGene(id=ID[i], type="refseq_dna",
>>> mart=ensembl)
>>> > }
>>>
>>> The problem is not dependent on the get function used or the id type
>>> used. I didn't have this problem yesterday on the same script. The
>>> error also occurs randomly, breaking the loop at any particular
>>> point, sometimes allowing thousands of loops to run.
>>>
>>> Could this be a problem with the server I'm pulling the information
>>> from? It just seems too random to be my coding - especially
>>> considering I didn't have this problem yesterday.
>>>
>>> I've had this before, ages ago, and would like to get to the bottom
>>> of it. And wisdom? Thanks.
>>>
>>>
>>> * *
>>>
>>> * *
>>>
>>> * *
>>>
>>> *Quin Wills*
>>> *DPhil candidate*
>>>
>>> * *
>>>
>>> *Department of Statistics*
>>>
>>> *University** of Oxford***
>>>
>>> *1 South Parks Road*
>>> *Oxford***
>>>
>>> *OX1 3TG
>>> United Kingdom*
>>>
>>>
>>>
>>> *01865 285 394*
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>>
>
> --
>
> * *
>
> * *
>
> * *
>
> *Quin Wills*
> *DPhil candidate*
>
> * *
>
> *Department of Statistics*
>
> *University** of Oxford***
>
> *1 South Parks Road*
> *Oxford***
>
> *OX1 3TG
> United Kingdom*
>
>
>
> *01865 285 394*
>
--
----------------------------------------------------------------
Steffen Durinck, PhD
Division of Biostatistics, University of California, Berkeley &
Life Sciences Department, Lawrence Berkeley National Laboratory
1 cyclotron Rd, Berkeley
CA, 94720, USA
Tel: +1-510-486-5202
More information about the Bioconductor
mailing list