[BioC] GOstats, query between the GO set from geneIdsByCategory and the GO set from org.Ce.egGO

Dan Du tooyoung at gmail.com
Tue Jan 21 11:28:39 CET 2014


Hi Miao,

The discrepancy you are seeing is due to the fact that table org.Ce.egGO
gives only gene ids that directly associated with the GO term but not
its children, like it was described in the man page, 

"org.Ce.egGO is an R object that provides mappings between entrez gene
identifiers and the GO identifiers that they are directly associated
with. This mapping and its reverse mapping do NOT associate the child
terms from the GO ontology with the gene. Only the directly evidenced
terms are represented here.
org.Ce.egGO2ALLEGS is an R object that provides mappings between a given
GO identifier and all of the Entrez Gene identifiers annotated at that
GO term OR TO ONE OF IT'S CHILD NODES in the GO ontology. Thus, this
mapping is much larger and more inclusive than org.Ce.egGO2EG."

So a more inclusive table you may want to have a look at is
org.Ce.egGO2ALLEGS. Other sources of variation could also be related to
your hyperGTest settings (like universeGeneIds).

HTH,
Dan

On Mon, 2014-01-20 at 19:45 +0800, 余淼 wrote:
> Dear Bioconductor developers and users,
> 
>   I'm using GOstats to make the enrichment analysis. And the result show
> follows:
> 
> > summary(r)
>       GOCCID       Pvalue OddsRatio   ExpCount Count Size Term
> 
> 1 GO:0030131 2.533063e-08 123.95833 0.11423841     5    9 ...
> 2 GO:0030117 3.487024e-08  52.26471 0.22847682     6   18 ...
> 3 GO:0048475 3.487024e-08  52.26471 0.22847682     6   18 ...
> 4 GO:0030118 5.024033e-08  99.11111 0.12693157     5   10 ...
> 
> In the result we can see that, the corresponding 'Count' and 'Size'
> values.And I use the function geneIdsByCategory(r)[["GO:0030117"]] and
> geneIdUniverse(r)[["GO:0030117"]] to get the corresponding genes of 'Count'
> and 'Size'.
> 
> > geneIdsByCategory(r)[["GO:0030117"]]
> [1] "172180" "173121" "173701" "175940" "180713" "186194"
> 
> > geneIdUniverse(r)[["GO:0030117"]]
> [1] "171860" "171952" "172180" "172553" "173121" "173304" "173701"
> [8] "174675" "175376" "175940" "177891" "178078" "178183" "179387"
> [15] "180317" "180713" "181163" "186194"
> 
> And the same time I use the package org.Ce.egGO to look for the
> 'GO:0030117' and get the genes in this set, there are only 8 genes in this
> set.
> 
> > ceGO <- toTable(org.Ce.egGO)
> > ceGO[ceGO$go_id == "GO:0030117",]
>       gene_id      go_id Evidence Ontology
> 31601  171860 GO:0030117      IEA       CC
> 32214  172553 GO:0030117      IEA       CC
> 32743  173304 GO:0030117      IEA       CC
> 32927  173701 GO:0030117      IEA       CC
> 34046  175376 GO:0030117      IEA       CC
> 34333  175750 GO:0030117      IEA       CC
> 35977  177891 GO:0030117      IEA       CC
> 38309  181163 GO:0030117      IEA       CC
> 
> What confused me is the different of the genes number between
> geneIdUniverse(r)[["GO:0030117"]] and ceGO[ceGO$go_id == "GO:0030117",].
> Why they are different from each other? Or some mistake I have made in
> those  process?
> 
> Wish you can give me a help!
> 
> Best,
> MiaoYu
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list