[BioC] mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6
Dick Beyer
dbeyer at u.washington.edu
Wed Mar 3 07:32:20 CET 2010
Hi Sean,
Thanks very much for looking into this. I guess I need to think about this. What is confusing to me is topGO takes a gene2GO list as input (a list of GO terms for each gene), which I generated from org.Mm.egGO2EG (no GO:0030522, for example). Getting GOIDs out of topGO that are in org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I should build my gene2GO input list from org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG.
I also didn't dig far enough when I checked GO:0030522 at geneontology.org, which showed 34 gene products for Mus musculus. However, had I looked further I would have seen GO:0030522 has no genes of its own.
Until recently, I used ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for getting Entrez Gene ID/GOIDs mappings, but switched to the Bioconductor org.Mm.eg.db way as it is much simplier.
Thanks for the good education!
Cheers,
Dick
*******************************************************************************
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
*******************************************************************************
On Tue, 2 Mar 2010, Sean Davis wrote:
> On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at u.washington.edu> wrote:
>> Hello,
>>
>> I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db.
>>
>> I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines. The output looks like:
>>
>> allResults[[1]][[1]][1:2,]
>> GO.ID Term Annotated Significant Expected classic elim weight
>> 714 GO:0019222 regulation of metabolic process 2498 143 107.08 0.00010 0.17956 0.9057
>> 762 GO:0006807 nitrogen compound metabolic process 3413 186 146.31 0.00011 0.45337 0.9434
>>
>> So, the topGO output gives a column of GOIDs and such.
>>
>> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700.
>>
>> I can't find these in names(Mm.egGO2EG).
>>
>> library("org.Mm.eg.db")
>> Mm.egGO2EG <- as.list(org.Mm.egGO2EG)
>> grep("GO:0030522",names(Mm.egGO2EG))
>> integer(0)
>>
>> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid.
>>
>> I'm puzzled by the mismatch. I want to get the genes for a given GOID, so there is probably a work around. If anyone has a suggestion or idea, I'd be very grateful to know what to try.
>>
>
> Hi, Dick.
>
> The Gene Ontology, as I'm sure everyone knows, is hierarchical. The
> org.Mm.egGO2EG table stores ONLY the most specific term for each gene.
> However, the org.Mm.egGO2ALLEGS stores the term and all the genes for
> itself AND its children. Most of the gene ontology analysis
> algorithms use the latter definition; it looks like topGO does also.
> In short, try this:
>
> get('GO:0030522',org.Mm.egGO2ALLEGS)
> IDA IMP IDA IGI IMP IGI IMP IMP
> "11835" "11835" "11848" "12034" "12034" "13082" "13123" "13983"
> IMP ISO IMP IDA IMP IMP IMP ISO
> "14228" "14599" "14602" "14815" "14815" "15502" "16000" "16000"
> IDA IDA IMP IDA IGI IMP IMP IDA
> "16601" "18667" "18854" "19213" "19378" "19378" "19411" "20181"
> IDA IDA IMP IMP IMP IPI IDA IGI
> "20182" "20183" "20779" "21815" "21848" "22215" "24074" "27401"
> IMP ISA IDA IDA IMP IDA
> "56351" "56847" "59035" "67488" "224903" "232174"
>
> Hope that helps clear things up.
>
> Sean
>
More information about the Bioconductor
mailing list