[BioC] GOstats, query between the GO set from geneIdsByCategory and the GO set from org.Ce.egGO
Dan Du
tooyoung at gmail.com
Tue Jan 21 11:28:39 CET 2014
Hi Miao,
The discrepancy you are seeing is due to the fact that table org.Ce.egGO
gives only gene ids that directly associated with the GO term but not
its children, like it was described in the man page,
"org.Ce.egGO is an R object that provides mappings between entrez gene
identifiers and the GO identifiers that they are directly associated
with. This mapping and its reverse mapping do NOT associate the child
terms from the GO ontology with the gene. Only the directly evidenced
terms are represented here.
org.Ce.egGO2ALLEGS is an R object that provides mappings between a given
GO identifier and all of the Entrez Gene identifiers annotated at that
GO term OR TO ONE OF IT'S CHILD NODES in the GO ontology. Thus, this
mapping is much larger and more inclusive than org.Ce.egGO2EG."
So a more inclusive table you may want to have a look at is
org.Ce.egGO2ALLEGS. Other sources of variation could also be related to
your hyperGTest settings (like universeGeneIds).
HTH,
Dan
On Mon, 2014-01-20 at 19:45 +0800, 余淼 wrote:
> Dear Bioconductor developers and users,
>
> I'm using GOstats to make the enrichment analysis. And the result show
> follows:
>
> > summary(r)
> GOCCID Pvalue OddsRatio ExpCount Count Size Term
>
> 1 GO:0030131 2.533063e-08 123.95833 0.11423841 5 9 ...
> 2 GO:0030117 3.487024e-08 52.26471 0.22847682 6 18 ...
> 3 GO:0048475 3.487024e-08 52.26471 0.22847682 6 18 ...
> 4 GO:0030118 5.024033e-08 99.11111 0.12693157 5 10 ...
>
> In the result we can see that, the corresponding 'Count' and 'Size'
> values.And I use the function geneIdsByCategory(r)[["GO:0030117"]] and
> geneIdUniverse(r)[["GO:0030117"]] to get the corresponding genes of 'Count'
> and 'Size'.
>
> > geneIdsByCategory(r)[["GO:0030117"]]
> [1] "172180" "173121" "173701" "175940" "180713" "186194"
>
> > geneIdUniverse(r)[["GO:0030117"]]
> [1] "171860" "171952" "172180" "172553" "173121" "173304" "173701"
> [8] "174675" "175376" "175940" "177891" "178078" "178183" "179387"
> [15] "180317" "180713" "181163" "186194"
>
> And the same time I use the package org.Ce.egGO to look for the
> 'GO:0030117' and get the genes in this set, there are only 8 genes in this
> set.
>
> > ceGO <- toTable(org.Ce.egGO)
> > ceGO[ceGO$go_id == "GO:0030117",]
> gene_id go_id Evidence Ontology
> 31601 171860 GO:0030117 IEA CC
> 32214 172553 GO:0030117 IEA CC
> 32743 173304 GO:0030117 IEA CC
> 32927 173701 GO:0030117 IEA CC
> 34046 175376 GO:0030117 IEA CC
> 34333 175750 GO:0030117 IEA CC
> 35977 177891 GO:0030117 IEA CC
> 38309 181163 GO:0030117 IEA CC
>
> What confused me is the different of the genes number between
> geneIdUniverse(r)[["GO:0030117"]] and ceGO[ceGO$go_id == "GO:0030117",].
> Why they are different from each other? Or some mistake I have made in
> those process?
>
> Wish you can give me a help!
>
> Best,
> MiaoYu
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list