[BioC] GOstat: listing genes from hyperGTest

Wed Oct 22 15:10:39 CEST 2008

Hi Tim,

Does probeSetSummary() do what you want?

Best,

Jim

Tim Smith wrote:
>  
> Hi,
> 
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What I
> wanted to now do was to co-relate each significant GO term (returned by
> this function) with genes (from my set of input genes) associated with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
> 
> Currently, what I'm doing is finding the significant GO terms by using the following code:
> 
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 
> 
>  paramsGO <- new("GOHyperGParams", geneIds = genes1,
>           universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
>           ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
>           testDirection = "over")
> 
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find which
> subset of genes in 'genes1' is associated with each of the significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
> 
> xx <- as.list(org.Hs.egGO2EG)
> 
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from here I
> look up which Entrez IDs are associated with my GO term of interest.
> 
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
> hyperGTest, but does not appear to be associated with any Entrez IDs in
> the org.Hs.eg.db package. Where could I be going wrong?
> 
> I would give a set of genes so that the example is reproducible, but with hundreds of genes the email will get too long!
> 
> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
> 
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
>  [1] grid      splines   tools     stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
>  [1]
> gplots_2.6.0         gmodels_2.14.1       gtools_2.4.0        
> gdata_2.4.1          Rgraphviz_1.18.1     GOstats_2.6.0       
> Category_2.6.0      
>  [8] RBGL_1.16.0          annotate_1.18.0     
> xtable_1.5-2         graph_1.18.0         PFAM.db_2.2.0       
> GO.db_2.2.0          KEGG.db_2.2.0       
> [15] org.Hs.eg.db_2.2.0   AnnotationDbi_1.2.0  RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0    survival_2.34-1      affy_1.18.0         
> [22] preprocessCore_1.2.0 affyio_1.8.0         Biobase_2.0.0       
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44    
> 
> 
> ---------------------------------
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662