[BioC] mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6
Dick Beyer
dbeyer at u.washington.edu
Wed Mar 3 17:18:31 CET 2010
Hi Adrian,
Thanks very much for your reply. Your example for building the topGO object was very helpful.
Another question: Do you have a favorite way to summarize the topGO output? What I am trying to do is something like CateGOrizer: http://www.animalgenome.org/bioinfo/tools/catego/
that uses higher level GO terms to give a summary overview of the enriched GO terms.
Thanks very much,
Dick
*******************************************************************************
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
*******************************************************************************
On Wed, 3 Mar 2010, Adrian Alexa wrote:
> Hi Dick,
>
> as Sean already mentioned the org.Mm.egGO2EG contains only the most
> specific GO annotations. topGO doesn't care if the supply the most
> specif gene-to-GO mappings or the complete mappings. You will obtain
> the same result if you use either org.Mm.egGO2EG or
> org.Mm.egGO2ALLEGS. However, do to the redundancies in the
> org.Mm.egGO2ALLEGS mappings I advise in using the most specific
> mappings.
>
> Also, since you are using a Bioconductor annotation package, you don't
> need to construct the gene2GO list to provide the annotations. There
> is a function, namely "annFUN.org" which is more convenient to use
> when building the "topGOdata" object. In this case the instantiation
> of a topGOdata object will look like:
>
> GOdata <- new("topGOdata",
> ontology = "BP",
> allGenes = geneList,
> nodeSize = 5,
> annot = annFUN.org,
> mapping = "org.Mm.eg.db",
> ID = "entrez")
>
> The "mapping" argument tells which annotation chip to be use and the
> "ID" argument selects one of the gene identifiers to be use.
>
>
> You can also use functions from topGO to access the genes annotated to
> a GO term of interest.
>
> # all the genes annotated to GO:0030522 -- NOT only the most specific ones!
> myGenes <- genesInTerm(GOdata, "GO:0030522")
>
> # the number of annotated genes
> no.myGenes <- countGenesInTerm(GOdata, "GO:0030522")
>
>
> Hope this helps. Let me know if you have additional questions.
>
>
> Regards,
> Adrian
>
>
>
>
>
>
>
>
>
>
> On Wed, Mar 3, 2010 at 7:32 AM, Dick Beyer <dbeyer at u.washington.edu> wrote:
>> Hi Sean,
>>
>> Thanks very much for looking into this. I guess I need to think about this.
>> What is confusing to me is topGO takes a gene2GO list as input (a list of
>> GO terms for each gene), which I generated from org.Mm.egGO2EG (no
>> GO:0030522, for example). Getting GOIDs out of topGO that are in
>> org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I should build
>> my gene2GO input list from org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG.
>>
>> I also didn't dig far enough when I checked GO:0030522 at geneontology.org,
>> which showed 34 gene products for Mus musculus. However, had I looked
>> further I would have seen GO:0030522 has no genes of its own.
>>
>> Until recently, I used ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for
>> getting Entrez Gene ID/GOIDs mappings, but switched to the Bioconductor
>> org.Mm.eg.db way as it is much simplier.
>>
>> Thanks for the good education!
>>
>> Cheers,
>> Dick
>> *******************************************************************************
>> Richard P. Beyer, Ph.D. University of Washington
>> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
>> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
>> Seattle, WA 98105-6099
>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
>> http://staff.washington.edu/~dbeyer
>> *******************************************************************************
>>
>> On Tue, 2 Mar 2010, Sean Davis wrote:
>>
>>> On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at u.washington.edu>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been running topGO (using mouse Entrez Gene IDs) and found that some
>>>> GO terms that turn up in the topGO analysis are not in the GO terms from
>>>> org.Mm.eg.db.
>>>>
>>>> I'd like to give some example code to show how to generate the problem,
>>>> but my topGO code is a lot of lines. The output looks like:
>>>>
>>>> allResults[[1]][[1]][1:2,]
>>>> GO.ID Term Annotated Significant
>>>> Expected classic elim weight
>>>> 714 GO:0019222 regulation of metabolic process 2498 143
>>>> 107.08 0.00010 0.17956 0.9057
>>>> 762 GO:0006807 nitrogen compound metabolic process 3413 186
>>>> 146.31 0.00011 0.45337 0.9434
>>>>
>>>> So, the topGO output gives a column of GOIDs and such.
>>>>
>>>> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094,
>>>> GO:0031497, GO:0046700.
>>>>
>>>> I can't find these in names(Mm.egGO2EG).
>>>>
>>>> library("org.Mm.eg.db")
>>>> Mm.egGO2EG <- as.list(org.Mm.egGO2EG)
>>>> grep("GO:0030522",names(Mm.egGO2EG))
>>>> integer(0)
>>>>
>>>> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db?
>>>> When I check for GO:0030522 for Mus musculus at geneontology.org,
>>>> GO:0030522 is valid.
>>>>
>>>> I'm puzzled by the mismatch. I want to get the genes for a given GOID,
>>>> so there is probably a work around. If anyone has a suggestion or idea, I'd
>>>> be very grateful to know what to try.
>>>>
>>>
>>> Hi, Dick.
>>>
>>> The Gene Ontology, as I'm sure everyone knows, is hierarchical. The
>>> org.Mm.egGO2EG table stores ONLY the most specific term for each gene.
>>> However, the org.Mm.egGO2ALLEGS stores the term and all the genes for
>>> itself AND its children. Most of the gene ontology analysis
>>> algorithms use the latter definition; it looks like topGO does also.
>>> In short, try this:
>>>
>>> get('GO:0030522',org.Mm.egGO2ALLEGS)
>>> IDA IMP IDA IGI IMP IGI IMP IMP
>>> "11835" "11835" "11848" "12034" "12034" "13082" "13123" "13983"
>>> IMP ISO IMP IDA IMP IMP IMP ISO
>>> "14228" "14599" "14602" "14815" "14815" "15502" "16000" "16000"
>>> IDA IDA IMP IDA IGI IMP IMP IDA
>>> "16601" "18667" "18854" "19213" "19378" "19378" "19411" "20181"
>>> IDA IDA IMP IMP IMP IPI IDA IGI
>>> "20182" "20183" "20779" "21815" "21848" "22215" "24074" "27401"
>>> IMP ISA IDA IDA IMP IDA
>>> "56351" "56847" "59035" "67488" "224903" "232174"
>>>
>>> Hope that helps clear things up.
>>>
>>> Sean
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list