[BioC] biomaRt error in getGene

Georg Otto georg.otto at tuebingen.mpg.de
Thu Jul 13 12:41:44 CEST 2006


Hi Steffen,

thanks for your help. I updated to biomaRt 1.7.3 and followed your
advice. What I got is this:

> mart <- useMart("ensembl")
> mart<-useDataset("drerio_gene_ensembl", mart)

> getGene(id="Dr.10336.1.S1_at", array="affy_zebrafish", mart=mart)
NULL

then I tried with getBM

> getBM(attributes=c("ensembl_transcript_id", "affy_zebrafish", "refseq_dna"), filters="affy_zebrafish", values="Dr.10336.1.S1_at", mart=mart)
NULL

I tried with several different affymetrix probes that should be
annotated in ensembl, but all of them gave me a "NULL"

> sessionInfo()
Version 2.3.0 (2006-04-24) 
x86_64-redhat-linux-gnu 

attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
[7] "base"     

other attached packages:
 biomaRt    RCurl      XML 
 "1.7.3"  "0.6-2" "0.99-7" 

Any idea what could be wrong here?

Best,

Georg


Steffen Durinck <durincks at mail.nih.gov> writes:

> Hi Georg,
>
> You need to use the latest version of biomaRt 1.7.3 for getGene to work 
> with zebrafish (See developmental packages).
> http://www.bioconductor.org/packages/1.9/bioc/html/biomaRt.html
>
> Also have a look at the getBM, listAttributes and listFilters 
> functions...these are robust against Ensembl database changes and allow 
> you to query more than what is possible with the simple biomaRt 
> functions such as getGene.
>
> Best,
> Steffen
>
> Georg Otto wrote:
>> Hi,
>>
>> I have a problem using biomaRt. I want to retrieve information connected to a probe, and do something like this:
>>
>>   
>>> mart<-useMart("ensembl", dataset="drerio_gene_ensembl")
>>>     
>> Checking attributes and filters ... ok
>>   
>>> getAffyArrays(mart)
>>>     
>> [1] "affy_zebrafish"
>>   
>>> getGene(id="Dr.1.1.S1_at", array="affy_zebrafish", mart=mart)
>>>     
>> Error in getBM(attributes = c(attrib, "hgnc_symbol", "description", chrname,  : 
>> 	attribute: hgnc_symbol not found, please use the function 'listAttributes' to get valid attribute names
>>
>>
>>   
>>> listAttributes(mart=mart)
>>>     
>>
>> <snip>
>>   [8] "adf_swissprot"                                        
>>   [9] "affy_zebrafish"                                       
>>  [10] "affy_zebrafish_primary_db"                            
>>  [11] "agilent_g2518a"                                       
>> <snip>
>>
>> So it seems that the attribute "affy_zebrafish" exists.
>>
>> What is wrong here? I had this problem before, and I got a reply from
>> Steffen Durinck (see below) that it has to do with an inconistency in
>> attribute and filter naming, i.e. the attribute for the affyids was
>> zebrafish_affy and the filter was called affy_zebrafish. I was told
>> that this will be fixed in the next ensembl release. It seems that the
>> diffrence between the array and the attribute has been repaired, since
>> both are now called affy_zebrafish, but the problem still persists.
>>
>>   
>>> sessionInfo()
>>>     
>> Version 2.3.1 (2006-06-01) 
>> powerpc-apple-darwin8.6.0 
>>
>> attached base packages:
>> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
>> [7] "base"     
>>
>> other attached packages:
>>      DBI  biomaRt    RCurl      XML 
>> "0.1-10"  "1.6.0"  "0.6-2" "0.99-7"
>>
>> Cheers,
>>
>> Georg
>>
>>
>>
>>   
>>> Hi,
>>>
>>>     
>>>> My understanding is that the BioMart folks are making changes to the
>>>> table names for the BioMart database, and this is happening right now
>>>> (e.g, right after BioC 1.8 is released). Unfortunately this means that
>>>> some of the convenience functions like getGene() are being broken.
>>>>       
>>
>>
>>   
>>> This is correct however here there is a problem with the zebrafish dataset
>>> as well.  Part of the error here is produced by an inconistency in
>>> attribute and filter naming.  The attribute for the affyids is
>>> zebrafish_affy and the filter is called affy_zebrafish.  The getGene
>>> function expects these to have the same name and thus generates an error.
>>>
>>> To make the BioMart datasets better we can post these inconsistencies to
>>> the corresponding BioMart mailinglist (in this case the Ensembl helpdesk)
>>> so they get fixed in the next database release.
>>>
>>> Best,
>>> Steffen
>>>
>>>     
>>>> Hi Georg,
>>>>
>>>> Georg Otto wrote:
>>>>       
>>>>> Hi,
>>>>>
>>>>> using biomaRt, I get an error:
>>>>>
>>>>>
>>>>>         
>>>>>> mart<-useMart("ensembl", dataset= "drerio_gene_ensembl")
>>>>>> getAffyArrays(mart)
>>>>>>           
>>>>> [1] "affy_zebrafish"
>>>>>
>>>>>         
>>>>>> getGene(id=genes.regulated.mas5, array="affy_zebrafish" , mart=mart)
>>>>>>           
>>>>> Error in getBM(attributes = c(attrib, "hgnc_symbol", "description",
>>>>> chrname,  :
>>>>>         attribute: affy_zebrafish not found, please use the function
>>>>> 'listAttributes' to get valid attribute names
>>>>>
>>>>>
>>>>> Then I use listAttributes() as requested:
>>>>>
>>>>>
>>>>>         
>>>>>> listAttributes(mart=mart)
>>>>>>           
>>>>> and get the following output:
>>>>>
>>>>> <snip>
>>>>> [309] "zebrafish_affy"
>>>>> [310] "zebrafish_affy_primary_db"
>>>>> <snip>
>>>>>
>>>>> using "zebrafish_affy" instead of "affy_zebrafish" does not help,
>>>>> however:
>>>>>
>>>>>
>>>>>         
>>>>>> getGene(id=genes.regulated.mas5, array="zebrafish_affy" , mart=mart)
>>>>>>           
>>>>> Error in getBM(attributes = c(attrib, "hgnc_symbol", "description",
>>>>> chrname,  :
>>>>>         attribute: hgnc_symbol not found, please use the function
>>>>> 'listAttributes' to get valid attribute names
>>>>>
>>>>> Any hint will be appreciated.
>>>>>         
>>>> My understanding is that the BioMart folks are making changes to the
>>>> table names for the BioMart database, and this is happening right now
>>>> (e.g, right after BioC 1.8 is released). Unfortunately this means that
>>>> some of the convenience functions like getGene() are being broken.
>>>>
>>>> I think your best bet is to use getBM() directly, and query for things
>>>> that you see when you do listAttributes(mart).
>>>>
>>>> HTH,
>>>>
>>>> Jim
>>>>       
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>   
>
>
> -- 
> Steffen Durinck, Ph.D.
>
> Oncogenomics Section
> Pediatric Oncology Branch
> National Cancer Institute, National Institutes of Health
> URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
>
> Phone: 301-402-8103
> Address:
> Advanced Technology Center,
> 8717 Grovemont Circle
> Gaithersburg, MD 20877
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 

Georg Wilhelm Otto

Max-Planck-Institute for Developmental Biology

georg.otto at tuebingen.mpg.de



More information about the Bioconductor mailing list