[BioC] Search queries with biomaRt does not align with online queries via ensembl
Hotz, Hans-Rudolf
hrh at fmi.ch
Mon Mar 1 09:31:24 CET 2010
On 2/28/10 7:16 PM, "Tony Chiang" <tchiang at fhcrc.org> wrote:
> Hi Steffen et al,
>
> Quick question about a search query via biomaRt. Here is the code that I am
> using:
>
> *****
> library(biomaRt)
> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> filters = listFilters(ensembl)
> attributes = listAttributes(ensembl)
> getBM(attributes=c("ensembl_peptide_id", "entrezgene",
> "ensembl_gene_id", "hgnc_automatic_gene_name"),
> filters="hgnc_automatic_gene_name", values="ATF4",
> mart=ensembl)
> *****
try ' filters="hgnc_symbol" ', eg:
> getBM(attributes=c("ensembl_peptide_id", "entrezgene","ensembl_gene_id",
"hgnc_automatic_gene_name"), filters="hgnc_symbol", values="ATF4", mart=ensembl)
ensembl_peptide_id entrezgene ensembl_gene_id hgnc_automatic_gene_name
1 ENSP00000384587 468 ENSG00000128272 NA
2 ENSP00000336790 468 ENSG00000128272 NA
3 ENSP00000379912 468 ENSG00000128272 NA
>
Hans
> For me, this returns an empty data frame. But when I query ATF4 online at
> ensembl, I find what I need. I also looked up ATF4 at genenames.org (HUGO)
> and it seems that ATF4 is a valid hgnc gene name, so the filter so be fine.
> I guess the only other reason that I can see is which dataset I use in the
> useMart function. I am guessing that the online API will search through all
> datasets while I am only specifying a single one? If this is true, do you
> know of a sensible work around? I have about 150 genes that I would like
> mapped to the EBML ID names but using the code above with a vector of gene
> names, I can only map around 25...but if I manually query for some of the
> non-mapped gene names, I get what I am after. If I am wrong about my guess
> in the dataset, can you let me know what you think might be going on?
>
> Tony
>
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-01-16 r50993)
> i386-apple-darwin10.2.0
>
> locale:
> [1] en_US.utf-8/en_US.utf-8/C/C/en_US.utf-8/en_US.utf-8
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] hgu133plus2.db_2.3.5 org.Hs.eg.db_2.3.6 Rgraphviz_1.25.1
> [4] biomaRt_2.3.0 GOstats_2.13.0 RSQLite_0.8-1
> [7] DBI_0.2-5 Category_2.13.0 AnnotationDbi_1.9.4
> [10] Biobase_2.7.3 RBGL_1.23.0 graph_1.25.5
>
> loaded via a namespace (and not attached):
> [1] annotate_1.25.1 genefilter_1.29.5 GO.db_2.3.5 GSEABase_1.9.0
> [5] RCurl_1.3-1 splines_2.11.0 survival_2.35-8 tools_2.11.0
> [9] XML_2.6-0 xtable_1.5-6
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list