[BioC] biomaRt error in getGene

Steffen Durinck durincks at mail.nih.gov
Thu Jul 13 15:08:29 CEST 2006


Hi Georg,

I've repeated your query and unfortunately get the same error. 
When using biomaRt with Ensembl,  it should return the same output as 
the Ensembl BioMart web application (http://www.ensembl.org/Multi/martview).
I often use this to verify that biomaRt is properly functioning or to 
find out if the reported error is due to a database error or due to a 
bug in the biomaRt package.

After doing a query using zebrafish affy ids, the MartView web 
application also returns no values for affy ids (e.g. querying for all 
affy ids on zebrafish chromosome 1).
It looks thus like there is an error in the zebrafish dataset of 
Ensembl.  Any problems of  this type should be reported at the 
corresponding database so they can be fixed in the next database 
release. For Ensembl this is helpdesk at ensembl.org. 
<mailto:helpdesk at ensembl.org>
I'll document this in the biomaRt vignette.

Best,
Steffen



Georg Otto wrote:
> Hi Steffen,
>
> thanks for your help. I updated to biomaRt 1.7.3 and followed your
> advice. What I got is this:
>
>   
>> mart <- useMart("ensembl")
>> mart<-useDataset("drerio_gene_ensembl", mart)
>>     
>
>   
>> getGene(id="Dr.10336.1.S1_at", array="affy_zebrafish", mart=mart)
>>     
> NULL
>
> then I tried with getBM
>
>   
>> getBM(attributes=c("ensembl_transcript_id", "affy_zebrafish", "refseq_dna"), filters="affy_zebrafish", values="Dr.10336.1.S1_at", mart=mart)
>>     
> NULL
>
> I tried with several different affymetrix probes that should be
> annotated in ensembl, but all of them gave me a "NULL"
>
>   
>> sessionInfo()
>>     
> Version 2.3.0 (2006-04-24) 
> x86_64-redhat-linux-gnu 
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
> [7] "base"     
>
> other attached packages:
>  biomaRt    RCurl      XML 
>  "1.7.3"  "0.6-2" "0.99-7" 
>
> Any idea what could be wrong here?
>
> Best,
>
> Georg
>
>
> Steffen Durinck <durincks at mail.nih.gov> writes:
>
>   
>> Hi Georg,
>>
>> You need to use the latest version of biomaRt 1.7.3 for getGene to work 
>> with zebrafish (See developmental packages).
>> http://www.bioconductor.org/packages/1.9/bioc/html/biomaRt.html
>>
>> Also have a look at the getBM, listAttributes and listFilters 
>> functions...these are robust against Ensembl database changes and allow 
>> you to query more than what is possible with the simple biomaRt 
>> functions such as getGene.
>>
>> Best,
>> Steffen
>>
>> Georg Otto wrote:
>>     
>>> Hi,
>>>
>>> I have a problem using biomaRt. I want to retrieve information connected to a probe, and do something like this:
>>>
>>>   
>>>       
>>>> mart<-useMart("ensembl", dataset="drerio_gene_ensembl")
>>>>     
>>>>         
>>> Checking attributes and filters ... ok
>>>   
>>>       
>>>> getAffyArrays(mart)
>>>>     
>>>>         
>>> [1] "affy_zebrafish"
>>>   
>>>       
>>>> getGene(id="Dr.1.1.S1_at", array="affy_zebrafish", mart=mart)
>>>>     
>>>>         
>>> Error in getBM(attributes = c(attrib, "hgnc_symbol", "description", chrname,  : 
>>> 	attribute: hgnc_symbol not found, please use the function 'listAttributes' to get valid attribute names
>>>
>>>
>>>   
>>>       
>>>> listAttributes(mart=mart)
>>>>     
>>>>         
>>> <snip>
>>>   [8] "adf_swissprot"                                        
>>>   [9] "affy_zebrafish"                                       
>>>  [10] "affy_zebrafish_primary_db"                            
>>>  [11] "agilent_g2518a"                                       
>>> <snip>
>>>
>>> So it seems that the attribute "affy_zebrafish" exists.
>>>
>>> What is wrong here? I had this problem before, and I got a reply from
>>> Steffen Durinck (see below) that it has to do with an inconistency in
>>> attribute and filter naming, i.e. the attribute for the affyids was
>>> zebrafish_affy and the filter was called affy_zebrafish. I was told
>>> that this will be fixed in the next ensembl release. It seems that the
>>> diffrence between the array and the attribute has been repaired, since
>>> both are now called affy_zebrafish, but the problem still persists.
>>>
>>>   
>>>       
>>>> sessionInfo()
>>>>     
>>>>         
>>> Version 2.3.1 (2006-06-01) 
>>> powerpc-apple-darwin8.6.0 
>>>
>>> attached base packages:
>>> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
>>> [7] "base"     
>>>
>>> other attached packages:
>>>      DBI  biomaRt    RCurl      XML 
>>> "0.1-10"  "1.6.0"  "0.6-2" "0.99-7"
>>>
>>> Cheers,
>>>
>>> Georg
>>>
>>>
>>>
>>>   
>>>       
>>>> Hi,
>>>>
>>>>     
>>>>         
>>>>> My understanding is that the BioMart folks are making changes to the
>>>>> table names for the BioMart database, and this is happening right now
>>>>> (e.g, right after BioC 1.8 is released). Unfortunately this means that
>>>>> some of the convenience functions like getGene() are being broken.
>>>>>       
>>>>>           
>>>   
>>>       
>>>> This is correct however here there is a problem with the zebrafish dataset
>>>> as well.  Part of the error here is produced by an inconistency in
>>>> attribute and filter naming.  The attribute for the affyids is
>>>> zebrafish_affy and the filter is called affy_zebrafish.  The getGene
>>>> function expects these to have the same name and thus generates an error.
>>>>
>>>> To make the BioMart datasets better we can post these inconsistencies to
>>>> the corresponding BioMart mailinglist (in this case the Ensembl helpdesk)
>>>> so they get fixed in the next database release.
>>>>
>>>> Best,
>>>> Steffen
>>>>
>>>>     
>>>>         
>>>>> Hi Georg,
>>>>>
>>>>> Georg Otto wrote:
>>>>>       
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> using biomaRt, I get an error:
>>>>>>
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> mart<-useMart("ensembl", dataset= "drerio_gene_ensembl")
>>>>>>> getAffyArrays(mart)
>>>>>>>           
>>>>>>>               
>>>>>> [1] "affy_zebrafish"
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> getGene(id=genes.regulated.mas5, array="affy_zebrafish" , mart=mart)
>>>>>>>           
>>>>>>>               
>>>>>> Error in getBM(attributes = c(attrib, "hgnc_symbol", "description",
>>>>>> chrname,  :
>>>>>>         attribute: affy_zebrafish not found, please use the function
>>>>>> 'listAttributes' to get valid attribute names
>>>>>>
>>>>>>
>>>>>> Then I use listAttributes() as requested:
>>>>>>
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> listAttributes(mart=mart)
>>>>>>>           
>>>>>>>               
>>>>>> and get the following output:
>>>>>>
>>>>>> <snip>
>>>>>> [309] "zebrafish_affy"
>>>>>> [310] "zebrafish_affy_primary_db"
>>>>>> <snip>
>>>>>>
>>>>>> using "zebrafish_affy" instead of "affy_zebrafish" does not help,
>>>>>> however:
>>>>>>
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> getGene(id=genes.regulated.mas5, array="zebrafish_affy" , mart=mart)
>>>>>>>           
>>>>>>>               
>>>>>> Error in getBM(attributes = c(attrib, "hgnc_symbol", "description",
>>>>>> chrname,  :
>>>>>>         attribute: hgnc_symbol not found, please use the function
>>>>>> 'listAttributes' to get valid attribute names
>>>>>>
>>>>>> Any hint will be appreciated.
>>>>>>         
>>>>>>             
>>>>> My understanding is that the BioMart folks are making changes to the
>>>>> table names for the BioMart database, and this is happening right now
>>>>> (e.g, right after BioC 1.8 is released). Unfortunately this means that
>>>>> some of the convenience functions like getGene() are being broken.
>>>>>
>>>>> I think your best bet is to use getBM() directly, and query for things
>>>>> that you see when you do listAttributes(mart).
>>>>>
>>>>> HTH,
>>>>>
>>>>> Jim
>>>>>       
>>>>>           
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>   
>>>       
>> -- 
>> Steffen Durinck, Ph.D.
>>
>> Oncogenomics Section
>> Pediatric Oncology Branch
>> National Cancer Institute, National Institutes of Health
>> URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
>>
>> Phone: 301-402-8103
>> Address:
>> Advanced Technology Center,
>> 8717 Grovemont Circle
>> Gaithersburg, MD 20877
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>     
>
>   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list