[BioC] mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6
Sean Davis
seandavi at gmail.com
Wed Mar 3 02:28:43 CET 2010
On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at u.washington.edu> wrote:
> Hello,
>
> I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db.
>
> I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines. The output looks like:
>
> allResults[[1]][[1]][1:2,]
> GO.ID Term Annotated Significant Expected classic elim weight
> 714 GO:0019222 regulation of metabolic process 2498 143 107.08 0.00010 0.17956 0.9057
> 762 GO:0006807 nitrogen compound metabolic process 3413 186 146.31 0.00011 0.45337 0.9434
>
> So, the topGO output gives a column of GOIDs and such.
>
> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700.
>
> I can't find these in names(Mm.egGO2EG).
>
> library("org.Mm.eg.db")
> Mm.egGO2EG <- as.list(org.Mm.egGO2EG)
> grep("GO:0030522",names(Mm.egGO2EG))
> integer(0)
>
> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid.
>
> I'm puzzled by the mismatch. I want to get the genes for a given GOID, so there is probably a work around. If anyone has a suggestion or idea, I'd be very grateful to know what to try.
>
Hi, Dick.
The Gene Ontology, as I'm sure everyone knows, is hierarchical. The
org.Mm.egGO2EG table stores ONLY the most specific term for each gene.
However, the org.Mm.egGO2ALLEGS stores the term and all the genes for
itself AND its children. Most of the gene ontology analysis
algorithms use the latter definition; it looks like topGO does also.
In short, try this:
get('GO:0030522',org.Mm.egGO2ALLEGS)
IDA IMP IDA IGI IMP IGI IMP IMP
"11835" "11835" "11848" "12034" "12034" "13082" "13123" "13983"
IMP ISO IMP IDA IMP IMP IMP ISO
"14228" "14599" "14602" "14815" "14815" "15502" "16000" "16000"
IDA IDA IMP IDA IGI IMP IMP IDA
"16601" "18667" "18854" "19213" "19378" "19378" "19411" "20181"
IDA IDA IMP IMP IMP IPI IDA IGI
"20182" "20183" "20779" "21815" "21848" "22215" "24074" "27401"
IMP ISA IDA IDA IMP IDA
"56351" "56847" "59035" "67488" "224903" "232174"
Hope that helps clear things up.
Sean
More information about the Bioconductor
mailing list