[BioC] biomart error

Hervé Pagès hpages at fhcrc.org
Thu Aug 18 20:45:02 CEST 2011


Natasha,

On 11-08-18 09:54 AM, Natasha Sahgal wrote:
> Dear Herve,
>
> Thanks for this.
>
> Though with "with_illumina_humanht_12" and "illumina_humanht_12" are
> definitely in the filters.
>
> listMarts(host="www.ensembl.org")
> ensembl = useMart("ENSEMBL_MART_ENSEMBL", host="www.ensembl.org",
> dataset="hsapiens_gene_ensembl")
> filt = listFilters(ensembl)
>
> filt[grep("illumina", filt[[1]], ignore.case=TRUE), ]
> name description
> 12 with_illumina_humanwg_6_v1 with Illumina HumanWG 6 v1 ID(s))
> 13 with_illumina_humanwg_6_v2 with Illumina HumanWG 6 v2 ID(s)
> 14 with_illumina_humanwg_6_v3 with Illumina HumanWG 6 v3 ID(s)
> 15 with_illumina_humanht_12 with Illumina Human HT 12 ID(s)
> 125 illumina_humanwg_6_v1 Illumina HumanWG 6 V1 ID(s) [e.g. 0000940471]
> 126 illumina_humanwg_6_v2 Illumina HumanWG 6 V2 ID(s) [e.g. ILMN_1748182]
> 127 illumina_humanwg_6_v3 Illumina HumanWG 6 v3 ID(s) [e.g. ILMN_2103362]
> 128 illumina_humanht_12 Illumina Human HT 12 ID(s) [e.g. ILMN_1672925]

Ah, I was looking at the attributes, my bad.

So it looks to me that the 1st filter is a boolean (TRUE/FALSE) that
you can turn on in order to only retrieve genes that are associated
with an Illumina Human HT 12 ID (i.e. any ID):

   > genes1 <- getBM(attributes="ensembl_gene_id", 
filters="with_illumina_humanht_12", values=TRUE, mart=ensembl)
   > dim(genes1)
   [1] 21719     1

With the 2nd filter you can actually specify the IDs that you are
interested in:

   > genes2 <- getBM(attributes="ensembl_gene_id", 
filters="illumina_humanht_12", values="ILMN_167292", mart=ensembl)
   > dim(genes2)
   [1] 0 1
   > genes2
   [1] ensembl_gene_id
   <0 rows> (or 0-length row.names)

Mmh it looks like there's no gene associated with the ID they give
in the description of the filter (as an example). Trying with an
other ID seems to work though:

   > genes2 <- getBM(attributes="ensembl_gene_id", 
filters="illumina_humanht_12", values="ILMN_2103362", mart=ensembl)
   > dim(genes2)
   [1] 1 1
   > genes2
     ensembl_gene_id
   1 ENSG00000159314

Hope this helps,
H.

>
> Many Thanks,
> Natasha
>
>
> On 18/08/2011 15:04, Hervé Pagès wrote:
>> Natasha,
>>
>> On 11-08-18 06:32 AM, Natasha Sahgal wrote:
>>> Dear All,
>>>
>>> Please ignore this, as now it does appear to work. I do wonder, why did
>>> it not work earlier though.
>>>
>>> Also, could someone please tell me the difference between
>>> "with_illumina_humanht_12" and "illumina_humanht_12" in the filters and
>>> attributes?
>>
>> I don't see the "with_illumina_humanht_12" attribute:
>>
>> library(biomaRt)
>> ensembl = useMart("ENSEMBL_MART_ENSEMBL", host="www.ensembl.org",
>> dataset="hsapiens_gene_ensembl")
>> atts = listAttributes(ensembl)
>>
>> Then:
>>
>> > atts[grep("illumina", atts[[1]], ignore.case=TRUE), ]
>> name description
>> 104 illumina_humanwg_6_v1 Illumina HumanWG 6 v1
>> 105 illumina_humanwg_6_v2 Illumina HumanWG 6 v2
>> 106 illumina_humanwg_6_v3 Illumina HumanWG 6 v3
>> 107 illumina_humanht_12 Illumina Human HT 12
>>
>>> What version of the illumina_humanht_12 chip is used, as this is
>>> unclear?
>>
>> According to the Illumina website the latest version of this chip seems
>> to be v4 but it's not clear indeed that the Emsembl folks are referring
>> to that one. You might want to ask them directly (biomaRt is just a tool
>> to query data their data).
>>
>> Cheers,
>> H.
>>
>>>
>>>
>>> Many Thanks,
>>> Natasha
>>>
>>> On 18/08/2011 11:02, Natasha Sahgal wrote:
>>>> Dear List,
>>>>
>>>> I am using biomaRt for the first time.
>>>>
>>>> Initially I had problems connecting to hsapiens dataset, though I
>>>> could listMarts. However that issue was resolved (by solutions in some
>>>> earlier posts) by changing the host to www.ensembl.org.
>>>>
>>>> Now though I get the following error message when I try to execute the
>>>> following command:
>>>>
>>>> >mg_eg.ens.t <- getBM(attributes =
>>>> c("entrezgene","ensembl_gene_id","ensembl_transcript_id","hgnc_symbol"),
>>>>
>>>> filters = "entrezgene", values = mg.u.eg$Entrez_Gene_ID, mart =
>>>> ensembl)
>>>>
>>>>
>>>> V1
>>>> 1 <head>
>>>> 2 <link rel=Shortcut Icon href=/errors/ensembl_ico.png
>>>> type=image/png />
>>>> 3 <title>The Ensembl Genome Browser</title>
>>>> 4 <style type=text/css>
>>>> 5
>>>> body{color:#333333;background-color:#eaeeff;font-family:Arial,Helvetica,Sans-serif;font-size:14pt;margin:0;padding:0}
>>>>
>>>>
>>>> 6
>>>> #masthead{color:#ffffff;background-color:#333366;width:100%;height:50px;padding:5px
>>>>
>>>> 0;font-size:1.75em;text-align:center;margin:auto}
>>>> Error in getBM(attributes = c("entrezgene", "ensembl_gene_id",
>>>> "ensembl_transcript_id", :
>>>> The query to the BioMart webservice returned an invalid result: the
>>>> number of columns in the result table does not equal the number of
>>>> attributes in the query. Please report this to the mailing list.
>>>>
>>>>
>>>> This I found strange as I did the same thing yesterday with another
>>>> object and it worked!!
>>>>
>>>> Full Code and SessionInfo:
>>>>
>>>> head(mg.u.eg$Entrez_Gene_ID) #[1] 27122 3488 72 1293 100134134 390
>>>>
>>>> length(mg.u.eg$Entrez_Gene_ID) #[1] 4288
>>>>
>>>>
>>>> listMarts(host="www.ensembl.org")
>>>> ensembl = useMart("ENSEMBL_MART_ENSEMBL", host="www.ensembl.org")
>>>> listDatasets(ensembl)
>>>> ensembl = useMart("ENSEMBL_MART_ENSEMBL", host="www.ensembl.org",
>>>> dataset="hsapiens_gene_ensembl")
>>>> filt = listFilters(ensembl)
>>>> atts = listAttributes(ensembl)
>>>>
>>>> mg_eg.ens <- getBM(attributes =
>>>> c("entrezgene","ensembl_gene_id","hgnc_symbol"), filters =
>>>> "entrezgene", values = mg.u.eg$Entrez_Gene_ID, mart = ensembl) # 4016 3
>>>>
>>>> mg_eg.ens.t <- getBM(attributes =
>>>> c("entrezgene","ensembl_gene_id","ensembl_transcript_id","hgnc_symbol"),
>>>>
>>>> filters = "entrezgene", values = mg.u.eg$Entrez_Gene_ID, mart =
>>>> ensembl)
>>>>
>>>> sessionInfo()
>>>> R version 2.13.0 (2011-04-13)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
>>>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] biomaRt_2.8.0 DESeq_1.4.1 locfit_1.5-6 lattice_0.19-23
>>>> [5] akima_0.5-4 Biobase_2.12.1 WriteXLS_2.1.0 gdata_2.8.2
>>>> [9] limma_3.8.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.30.0 AnnotationDbi_1.14.1 DBI_0.2-5
>>>> [4] genefilter_1.34.0 geneplotter_1.30.0 grid_2.13.0
>>>> [7] gtools_2.6.2 RColorBrewer_1.0-2 RCurl_1.6-4
>>>> [10] RSQLite_0.9-4 splines_2.13.0 survival_2.36-5
>>>> [13] tools_2.13.0 XML_3.4-0 xtable_1.5-6
>>>>
>>>>
>>>> Many Thanks,
>>>> Natasha
>>>>
>>>> --
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list