[BioC] GO term as "keytype" in GO.db
Robert Castelo
robert.castelo at upf.edu
Tue Apr 30 17:50:21 CEST 2013
hi,
i was about to fetch GO identifiers (IDs) matching certain GO terms
using the GO.db package, but i've found out that GO.db only considers GO
IDs as possible keys:
suppressStartupMessages(library(GO.db))
keytypes(GO.db)
[1] "GOID"
in section 0.4 of the AnnotationDbi vignette on "Using select with
GO.db" an example is given with using GO IDs as keys but i think it
would be handy to interrogate also what GO IDs match or contain a
particular term such as "rna binding", for example, doing either:
* for matching
select(GO.db, keys="RNA binding", cols="GOID", keytype="TERM")
* for containing
allTerms <- keys(GO.db, keytype="TERM")
rnabindingterms <- allTerms[grep("RNA binding", allTerms)]
select(GO.db, keys=rnabindingterms, cols="GOID", keytype="TERM")
once you got the GO IDs you can interrogate what genes have such a GO
term annotated to them.
currently this is not possible because the only key allowed is GOID:
head(keys(GO.db, keytype="TERM"))
[1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007"
[6] "GO:0000009"
head(keys(GO.db, keytype="DEFINITION"))
[1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007"
[6] "GO:0000009"
head(keys(GO.db, keytype="ONTOLOGY"))
[1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007"
[6] "GO:0000009"
while in other packages, such as org.Hs.eg.db, basically all columns of
information can be used as keys:
library(org.Hs.eg.db)
keytypes(org.Hs.eg.db)
[1] "ENTREZID" "PFAM" "IPI" "PROSITE"
"ACCNUM"
[6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND"
"ENZYME"
[11] "MAP" "PATH" "PMID" "REFSEQ"
"SYMBOL"
[16] "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
"GENENAME"
[21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" "GOALL"
[26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG"
i'm also aware that GO.db defines several hash tables, among them
GOTERM, which can be used in the following way for my purpose:
goterms <- unlist(eapply(GOTERM, function(x) x at Term))
which(goterms == "RNA binding")
GO:0003723
2714
but the first step is much slower than using the 'select' method and i
would prefer to use a more homogeneous way to pull all data in GO.db
i look forward to your comments on this.
best regards,
robert.
ps: sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] org.Hs.eg.db_2.9.0 GO.db_2.9.0 RSQLite_0.11.3
[4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0
[7] BiocGenerics_0.6.0 vimcom_0.9-8 setwidth_1.0-3
[10] colorout_1.0-0
loaded via a namespace (and not attached):
[1] IRanges_1.18.0 stats4_3.0.0
More information about the Bioconductor
mailing list