[BioC] Hypergeometric test with Disease Ontology

Gilbert Feng g-feng at northwestern.edu
Thu Jan 27 17:19:49 CET 2011


Hi, Ted

If I correctly understand your question, you have your own gene pool, 
don't you? In that case, GeneAnswers still can handle that. What you 
need is to build a customized DOLite list to run analysis. For example,

if your genes are mygenes(an unique Entrez GENE ID vector)

myDOLite <- lapply(DOLite, intersect, mygenes)
names(myDOLite) <- DOLiteTerm[names(myDOLite)]
#Please note that the parameter 'known' is TRUE in geneAnswersBuilder, 
which means the test will only use shared genes in mygenes and DOLite 
genes as the gene pool to run hypergeometric test. You can change the 
number of gene pool by set parameter 'totalGeneNumber' and set 'known' 
to FALSE at the same time .
myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...)

since DO is designed for human, you can use setAnnLib to set the species 
as 'org.Hs.eg.db' for further visualization or analysis.

Let me know if you have any question.

Gilbert

BTW, the current one has a new function to automatically generate a 
report for a or several groups of genes. Please check "groupReport" 
page, but before you run it, make sure you have java support and flash 
installation.


On 1/27/11 9:45 AM, Ted Morrow wrote:
> Thanks for your replies Gilbert and Peter.....I forgot to say in my
> original question that I did look into using both the GeneAnswers
> package and the FunDO web-based system but the issue I have not been
> able to solve with them is that I cannot specify "my gene Universe"
> within which my selected genes are tested for enrichment. I want to see
> which diseases are enriched out of the possible diseases in "my gene
> universe" which is a subset of all possible human genes.
>
> Is that possible in GeneAnswers (or even FunDO)?
>
> Cheers
> Ted
>
> On 1/27/2011 4:39 PM, Peter Robinson wrote:
>> On 01/27/2011 04:17 PM, Gilbert Feng wrote:
>>> Hi, Ted
>>>
>>> DO.db is a standard sqlite file and you can use standard RSQLite
>>> procedure to retrieve the information. Actually we do have a lite
>>> version of Disease Ontology, DOLite that removes some redundant nodes,
>>> integrated in GeneAnswers package, which also uses hypergeometric test
>>> to run enrichment analysis as well as automatically generated
>>> interactive(cytoscape web support) html summary for one or more groups
>>> of genes.
>>>
>>> Best
>>>
>>> Gilbert
>>
>>
>>
>>
>> You might also want to take a look at this website:
>> http://django.nubic.northwestern.edu/fundo/faq
>> which implements enrichment analysis using genes annotated to DO terms.
>> -peter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>> On 1/27/11 5:00 AM, Ted Morrow wrote:
>>>> Dear all,
>>>>
>>>> I would like to conduct a hypergeometric test on a list of genes but
>>>> using Disease Ontology instead of GO or KEGG terms. The package "DO.db"
>>>> contains this information but I have been unable to find a way of using
>>>> this database in conjunction with the "GOstats" package that I have
>>>> been
>>>> using.
>>>>
>>>> Has anyone attempted to run a hypergeometric test for Disease Ontology
>>>> terms? Is there another package I could use? Or is there a way of
>>>> modifying the argument (if that's the right word) GOHyperGParams in
>>>> GOstats so that it can make use of the information in DO.db?
>>>>
>>>> Both the GO and KEGG analyses work fine:
>>>> ### GO chunk
>>>> params<- new("GOHyperGParams",
>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>> annotation = "hgu95av2.db", ontology="BP",
>>>> pvalueCutoff=0.01, conditional=FALSE,
>>>> testDirection="over")
>>>> hgOver<- hyperGTest(params)
>>>>
>>>> hgOver
>>>> Gene to GO BP test for over-representation
>>>> 2136 GO BP ids tested (15 have p< 0.01)
>>>> Selected gene set size: 112
>>>> Gene universe size: 951
>>>> Annotation package: hgu95av2
>>>>
>>>> ### KEGG chunk
>>>> paramsKEGG<- new("KEGGHyperGParams",
>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>> annotation = "hgu95av2.db",
>>>> pvalueCutoff=0.01,
>>>> testDirection="over")
>>>>
>>>> hgOverKEGG<- hyperGTest(paramsKEGG)
>>>> hgOverKEGG
>>>> summary(hgOverKEGG)
>>>>
>>>> My data looks like this:
>>>> str(selectedEntrezIds)
>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414" "7546"
>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235"
>>>> "28962" "10269" "4069" "30835" "7018" ...
>>>>
>>>> > str(entrezUniverse)
>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165" "6521"
>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842" "5048"
>>>> "4843" "6329" "5080" "6401" "6853" ...
>>>>
>>>> My naive attempts to use DO have included:
>>>> paramsDO<- new("DOHyperGParams",
>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>> annotation = "DO.db",
>>>> pvalueCutoff=0.01,
>>>> testDirection="over")
>>>>
>>>> Which of course doesn't work and gives the following error:
>>>> Error in getClass(Class, where = topenv(parent.frame())) :
>>>> "DOHyperGParams" is not a defined class
>>>>
>>>> > traceback()
>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain = NA)
>>>> 2: getClass(Class, where = topenv(parent.frame()))
>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds, universeGeneIds =
>>>> entrezUniverse,
>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over")
>>>>
>>>>
>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the
>>>> following error:..
>>>>
>>>> hgOverDO<- hyperGTest(paramsDO)
>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) :
>>>> 'arg' should be one of “BP”, “CC”, “MF”
>>>>
>>>> traceback()
>>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices),
>>>> collapse = ", ")), domain = NA)
>>>> 9: match.arg(ontology, c("BP", "CC", "MF"))
>>>> 8: getUniverseViaGo(p)
>>>> 7: universeBuilder(p)
>>>> 6: universeBuilder(p)
>>>> 5: .hyperGTestInternal(p)
>>>> 4: is(object, Cl)
>>>> 3: is(object, Cl)
>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase",
>>>> "hyperGTest")
>>>> 1: hyperGTest(paramsDO)
>>>>
>>>>
>>>> Any help would be greatly appreciated.
>>>> /Ted
>>>>
>>>> > sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>> LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5
>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0
>>>> Category_2.16.0 AnnotationDbi_1.12.0
>>>> [12] Biobase_2.10.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0
>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> .
>>>
>>
>>
>

-- 
-----------------------------------------------
Gang (Gilbert) Feng, PhD
Biomedical Informatics Center
Robert H. Lurie Comprehensive Cancer Center
Northwestern University
750 N. Lake Shore Drive, 11th Floor(11-175e)
Chicago, IL  60611
Phone:312-503-2358
Email g-feng (at) northwestern.edu



More information about the Bioconductor mailing list