[BioC] BiomaRt error: ncol(result) == length(attributes) is not TRUE

Steffen sdurinck at lbl.gov
Thu Mar 27 19:32:34 CET 2008


Hi Quin,

It can deal with a long vector of identifiers e,g, 30000 ids should work 
in one query and should be fast.

Cheers,
Steffen


Quin Wills wrote:
> Thank you Steffen for the really quick reply.
>
> Out of interest, I tried using Sys.sleep() but I still get the same 
> problem. www.ensembl.org is also running quite slowly this side for 
> queries, and I wonder if that might be somehow related.
>
> Sorry if this is in the manual - I didn't spot it. How many queries in 
> a batch do you think one should avoid going over? I've a fairly long 
> list of identifiers.
>
> Quin
>
>
> Steffen wrote:
>> Hi Quin,
>>
>> How long is your list of identifiers?  It is not recommended to run a 
>> query like this in loops as this causes the web service to go out of 
>> sync at some point during the loop.
>> biomaRt is made to perform batch queries.
>> I would recommend to do your query as follows:
>>
>> genes <- getGene(id=ID, type="refseq_dna", mart=ensembl)
>>
>> This will give you a dataframe with the info for all the genes. If 
>> needed you can then loop over the result.
>> If you feel like you really need to loop you could add Sys.sleep(1) 
>> in  the loop.
>>
>> Cheers,
>> Steffen
>>
>> Quin Wills wrote:
>>> Hello all
>>>
>>> I'm running the most up to date R and biomaRt.
>>>
>>> I get the following error:
>>>  >Error: ncol(result) == length(attributes) is not TRUE
>>>
>>> for the following loop:
>>> # 'ID' is a character vector of refseq IDs
>>> #'gene', for the purposes of the argument here, is a list storing 
>>> the output
>>>
>>>  > ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
>>>  > for (i in 1:length(ID)) {
>>>  >       gene[[i]] <- getGene(id=ID[i], type="refseq_dna", 
>>> mart=ensembl)
>>>  > }
>>>
>>> The problem is not dependent on the get function used or the id type 
>>> used. I didn't have this problem yesterday on the same script. The 
>>> error also occurs randomly, breaking the loop at any particular 
>>> point, sometimes allowing thousands of loops to run.
>>>
>>> Could this be a problem with the server I'm pulling the information 
>>> from? It just seems too random to be my coding - especially 
>>> considering I didn't have this problem yesterday.
>>>
>>> I've had this before, ages ago, and would like to get to the bottom 
>>> of it. And wisdom? Thanks.
>>>  
>>>
>>> * *
>>>
>>> * *
>>>
>>> * *
>>>
>>> *Quin Wills*
>>> *DPhil candidate*
>>>
>>> * *
>>>
>>> *Department of Statistics*
>>>
>>> *University** of Oxford***
>>>
>>> *1 South Parks Road*
>>> *Oxford***
>>>
>>> *OX1 3TG
>>> United Kingdom*
>>>
>>>  
>>>
>>> *01865 285 394*
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>   
>>
>>
>
> -- 
>
> * *
>
> * *
>
> * *
>
> *Quin Wills*
> *DPhil candidate*
>
> * *
>
> *Department of Statistics*
>
> *University** of Oxford***
>
> *1 South Parks Road*
> *Oxford***
>
> *OX1 3TG
> United Kingdom*
>
>  
>
> *01865 285 394*
>


-- 
----------------------------------------------------------------
Steffen Durinck, PhD

Division of Biostatistics, University of California, Berkeley &
Life Sciences Department, Lawrence Berkeley National Laboratory
1 cyclotron Rd, Berkeley
CA, 94720, USA
Tel: +1-510-486-5202



More information about the Bioconductor mailing list