[BioC] using BioMart to query UniProt identifiers
Steffen Durinck
durinck.steffen at gene.com
Thu Apr 7 19:32:35 CEST 2011
Hi Wolfgang,
There are a few issues:
1) You're missing a filter attribute in your getBM query. This will
result in you querying for GO ids of everything that is in uniprot and
that is probably why it is taking so long.
If you do the following commands it should be fast:
uniProt <- useMart("unimart", dataset="uniprot")
IDs <- c("MTMR1_HUMAN","MTMR2_HUMAN","MTMR3_HUMAN","MTMR4_HUMAN")
GO_IDs <- getBM(attributes
=c("name","go_id"),filter="accession",values=IDs ,mart=uniProt)
2) You'll notice that you don't get anything back. You'll either need
to give it an accession number (for MTMR1 this is Q13613) and use the
accession filter name or give it a gene name e.g. MTMR1 and use the
gene_name filter.
e.g.:
getBM(attributes =c("name","go_id"),filter="gene_name",values="MTMR1"
,mart=uniProt)
or
getBM(attributes =c("name","go_id"),filter="accession",values="Q13613"
,mart=uniProt)
Cheers,
Steffen
On Wed, Apr 6, 2011 at 8:50 AM, Wolfgang RAFFELSBERGER <wraff at igbmc.fr> wrote:
> Dear list,
>
> Context : I'd like to calculate GO enrichments for a list of UniProt identifiers (note that they are "ID" or "Entry name" and NOT "AC" or "Accession").
> So I tried to use BioMart to extract the GO-IDs for my list of UniProt identifiers, see code below.
> Basically after calling getBM() R doesn't return the command-line any more for more than 5 minutes. I tested this on Linux and Windows -> both same problem, so I suppose either I might be doing wrong or something isn't working right.
>
> Any hints ?
>
> Thank's in advance,
> Wolfgang Raffelsberger
>
>
> ## the code ..
> require(annotate)
> require(biomaRt)
>
> IDs <- c("MTMR1_HUMAN","MTMR2_HUMAN","MTMR3_HUMAN","MTMR4_HUMAN") ## existing UniProt IDs
>
> uniProt <- useMart("unimart")
> listAttributes(useDataset("uniprot",mart=uniProt)) ## contains "name" and "go_id"
> GO_IDs <- getBM(attributes =c("name","go_id"),values=IDs, mart=useDataset("uniprot",mart=uniProt))
> ## after >5 minutes the command-line is still not returned ...
>
>
> ## for completeness :
> sessionInfo()
>
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
> [5] LC_TIME=French_France.1252
>
> attached base packages:
> [1] grDevices datasets splines graphics stats tcltk utils
> [8] methods base
>
> other attached packages:
> [1] biomaRt_2.6.0 annotate_1.28.0 AnnotationDbi_1.12.0
> [4] Biobase_2.10.0 svSocket_0.9-51 TinnR_1.0.3
> [7] R2HTML_2.2 Hmisc_3.8-3 survival_2.36-5
>
> loaded via a namespace (and not attached):
> [1] cluster_1.13.3 DBI_0.2-5 grid_2.12.2 lattice_0.19-17
> [5] RCurl_1.4-2.1 RSQLite_0.9-4 svMisc_0.9-61 tools_2.12.2
> [9] XML_3.1-0.1 xtable_1.5-6
>
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Wolfgang Raffelsberger, PhD
> IGBMC,
> 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
> Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
> wolfgang.raffelsberger (at) igbmc.fr
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list