[BioC] Hypergeometric test with Disease Ontology
Ted Morrow
ted.morrow at ebc.uu.se
Fri Jan 28 14:25:09 CET 2011
Hi Gilbert,
Thanks for the response....it seems like GeneAnswers may do the job but
I am still having trouble getting it to work for me.
I have two gene pools - one is a universe of genes (called say
"mygenepool") which is only a subset on all human genes, the other is a
subset of mygenepool (called say "mygenes"). I am interested in knowing
which DO terms show over-representation in mygenes given the DO terms
present in mygenepool. So, in your example I am a bit confused as to
which pool "mygenes" refers to. Is it "mygenepool" or "mygenes"?
So I have......
str(mygenepool) # I used as.matrix then as.vector to change the
dataframe into a vector...is that OK?
int [1:1310] 8813 3075 2729 8379 204 170302 10165 6521 799 3052 ...
and
str(mygenes)
int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ...
and mygenes is a subset of mygenepool:
overlap <- intersect(mygenepool, mygenes)
str(overlap)
int [1:1202] 6833 10083 60528 8842 5048 4843 6329 5080 6401 6853 ...
Running the first bit of your example code seems to work fine:
myDOLite <- lapply(DOLite, intersect, mygenepool)
names(myDOLite) <- DOLiteTerm[names(myDOLite)]
and so does creating a GeneAnswers instance (myGA) whether I specify the
totalGeneNumber or not:
myGA <- geneAnswersBuilder(mygenes, myDOLite, categoryType='DOLITE',
testType='hyperG',
#known=FALSE, totalGeneNumber=1202,
pvalueT=0.1, geneExpressionProfile=NULL, verbose=TRUE)
But then I get the following error:
myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE)
[1] "Mapping geneInput ..."
Error in switch(sub("org.*[:.:]", "", libname), eg = "EG", tair =
"TAIR", :
EXPR must be a length 1 vector
traceback()
2: getSymbols(as.vector(x at geneInput[, 1]), x at annLib, strict = strict,
missing = missing)
1: geneAnswersReadable(myGA, verbose = TRUE)
If I instead create the GeneAnswers instance with 'org.Hs.eg.db' and not
myDOLite the step.....
myGAreadable <- geneAnswersReadable(myGA, verbose=TRUE)
.....works fine as do later steps (for example geneAnsersChartPlots and
topDOLiteGenes)
But then presumably I am not working with the restricted pool of genes
that I want to (i.e. mygenepool)
Any ideas?
Cheers
Ted
On 1/27/2011 5:19 PM, Gilbert Feng wrote:
> Hi, Ted
>
> If I correctly understand your question, you have your own gene pool,
> don't you? In that case, GeneAnswers still can handle that. What you
> need is to build a customized DOLite list to run analysis. For example,
>
> if your genes are mygenes(an unique Entrez GENE ID vector)
>
> myDOLite <- lapply(DOLite, intersect, mygenes)
> names(myDOLite) <- DOLiteTerm[names(myDOLite)]
> #Please note that the parameter 'known' is TRUE in geneAnswersBuilder,
> which means the test will only use shared genes in mygenes and DOLite
> genes as the gene pool to run hypergeometric test. You can change the
> number of gene pool by set parameter 'totalGeneNumber' and set 'known'
> to FALSE at the same time .
> myGA <- geneAnswersBuilder(mygenes, myDOLite, testType='hyperG', ...)
>
> since DO is designed for human, you can use setAnnLib to set the
> species as 'org.Hs.eg.db' for further visualization or analysis.
>
> Let me know if you have any question.
>
> Gilbert
>
> BTW, the current one has a new function to automatically generate a
> report for a or several groups of genes. Please check "groupReport"
> page, but before you run it, make sure you have java support and flash
> installation.
>
>
> On 1/27/11 9:45 AM, Ted Morrow wrote:
>> Thanks for your replies Gilbert and Peter.....I forgot to say in my
>> original question that I did look into using both the GeneAnswers
>> package and the FunDO web-based system but the issue I have not been
>> able to solve with them is that I cannot specify "my gene Universe"
>> within which my selected genes are tested for enrichment. I want to see
>> which diseases are enriched out of the possible diseases in "my gene
>> universe" which is a subset of all possible human genes.
>>
>> Is that possible in GeneAnswers (or even FunDO)?
>>
>> Cheers
>> Ted
>>
>> On 1/27/2011 4:39 PM, Peter Robinson wrote:
>>> On 01/27/2011 04:17 PM, Gilbert Feng wrote:
>>>> Hi, Ted
>>>>
>>>> DO.db is a standard sqlite file and you can use standard RSQLite
>>>> procedure to retrieve the information. Actually we do have a lite
>>>> version of Disease Ontology, DOLite that removes some redundant nodes,
>>>> integrated in GeneAnswers package, which also uses hypergeometric test
>>>> to run enrichment analysis as well as automatically generated
>>>> interactive(cytoscape web support) html summary for one or more groups
>>>> of genes.
>>>>
>>>> Best
>>>>
>>>> Gilbert
>>>
>>>
>>>
>>>
>>> You might also want to take a look at this website:
>>> http://django.nubic.northwestern.edu/fundo/faq
>>> which implements enrichment analysis using genes annotated to DO terms.
>>> -peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> On 1/27/11 5:00 AM, Ted Morrow wrote:
>>>>> Dear all,
>>>>>
>>>>> I would like to conduct a hypergeometric test on a list of genes but
>>>>> using Disease Ontology instead of GO or KEGG terms. The package
>>>>> "DO.db"
>>>>> contains this information but I have been unable to find a way of
>>>>> using
>>>>> this database in conjunction with the "GOstats" package that I have
>>>>> been
>>>>> using.
>>>>>
>>>>> Has anyone attempted to run a hypergeometric test for Disease
>>>>> Ontology
>>>>> terms? Is there another package I could use? Or is there a way of
>>>>> modifying the argument (if that's the right word) GOHyperGParams in
>>>>> GOstats so that it can make use of the information in DO.db?
>>>>>
>>>>> Both the GO and KEGG analyses work fine:
>>>>> ### GO chunk
>>>>> params<- new("GOHyperGParams",
>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>>> annotation = "hgu95av2.db", ontology="BP",
>>>>> pvalueCutoff=0.01, conditional=FALSE,
>>>>> testDirection="over")
>>>>> hgOver<- hyperGTest(params)
>>>>>
>>>>> hgOver
>>>>> Gene to GO BP test for over-representation
>>>>> 2136 GO BP ids tested (15 have p< 0.01)
>>>>> Selected gene set size: 112
>>>>> Gene universe size: 951
>>>>> Annotation package: hgu95av2
>>>>>
>>>>> ### KEGG chunk
>>>>> paramsKEGG<- new("KEGGHyperGParams",
>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>>> annotation = "hgu95av2.db",
>>>>> pvalueCutoff=0.01,
>>>>> testDirection="over")
>>>>>
>>>>> hgOverKEGG<- hyperGTest(paramsKEGG)
>>>>> hgOverKEGG
>>>>> summary(hgOverKEGG)
>>>>>
>>>>> My data looks like this:
>>>>> str(selectedEntrezIds)
>>>>> chr [1:157] "60528" "6853" "10157" "5081" "389434" "6591" "7414"
>>>>> "7546"
>>>>> "3074" "6916" "6559" "23503" "8626" "6557" "38" "60" "9733" "113235"
>>>>> "28962" "10269" "4069" "30835" "7018" ...
>>>>>
>>>>> > str(entrezUniverse)
>>>>> chr [1:1310] "8813" "3075" "2729" "8379" "204" "170302" "10165"
>>>>> "6521"
>>>>> "799" "3052" "1387" "5244" "3674" "6833" "10083" "60528" "8842"
>>>>> "5048"
>>>>> "4843" "6329" "5080" "6401" "6853" ...
>>>>>
>>>>> My naive attempts to use DO have included:
>>>>> paramsDO<- new("DOHyperGParams",
>>>>> geneIds = selectedEntrezIds, universeGeneIds = entrezUniverse,
>>>>> annotation = "DO.db",
>>>>> pvalueCutoff=0.01,
>>>>> testDirection="over")
>>>>>
>>>>> Which of course doesn't work and gives the following error:
>>>>> Error in getClass(Class, where = topenv(parent.frame())) :
>>>>> "DOHyperGParams" is not a defined class
>>>>>
>>>>> > traceback()
>>>>> 3: stop(gettextf("\"%s\" is not a defined class", Class), domain =
>>>>> NA)
>>>>> 2: getClass(Class, where = topenv(parent.frame()))
>>>>> 1: new("DOHyperGParams", geneIds = selectedEntrezIds,
>>>>> universeGeneIds =
>>>>> entrezUniverse,
>>>>> annotation = "DO.db", pvalueCutoff = 0.01, testDirection = "over")
>>>>>
>>>>>
>>>>> Replacing "GOHyperGParams" with "DOHyperGParams" also gives the
>>>>> following error:..
>>>>>
>>>>> hgOverDO<- hyperGTest(paramsDO)
>>>>> Error in match.arg(ontology, c("BP", "CC", "MF")) :
>>>>> 'arg' should be one of “BP”, “CC”, “MF”
>>>>>
>>>>> traceback()
>>>>> 10: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices),
>>>>> collapse = ", ")), domain = NA)
>>>>> 9: match.arg(ontology, c("BP", "CC", "MF"))
>>>>> 8: getUniverseViaGo(p)
>>>>> 7: universeBuilder(p)
>>>>> 6: universeBuilder(p)
>>>>> 5: .hyperGTestInternal(p)
>>>>> 4: is(object, Cl)
>>>>> 3: is(object, Cl)
>>>>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase",
>>>>> "hyperGTest")
>>>>> 1: hyperGTest(paramsDO)
>>>>>
>>>>>
>>>>> Any help would be greatly appreciated.
>>>>> /Ted
>>>>>
>>>>> > sessionInfo()
>>>>> R version 2.12.1 (2010-12-16)
>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>>> LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] KEGG.db_2.4.5 DO.db_2.1.0 GO.db_2.4.5 hgu95av2.db_2.4.5
>>>>> org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5
>>>>> graph_1.28.0
>>>>> Category_2.16.0 AnnotationDbi_1.12.0
>>>>> [12] Biobase_2.10.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] annotate_1.28.0 genefilter_1.32.0 GSEABase_1.12.2 RBGL_1.26.0
>>>>> splines_2.12.1 survival_2.36-2 tools_2.12.1 XML_3.2-0.2 xtable_1.5-6
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> .
>>>>
>>>
>>>
>>
>
--
__________________________________________________________________________________
Edward H. Morrow
Department of Ecology and Evolution
Animal Ecology (zooekologi)
Evolutionary Biology Centre
Uppsala University
Norbyvägen 18-D
SE-752 36 Uppsala
SWEDEN
Email: ted.morrow at ebc.uu.se
Tel: +46 18 471 2676
Fax +46 18 471 6484
Webpage: http://www.iee.uu.se/zooekol/default.php?type=personalpage&id=119&lang=en
More information about the Bioconductor
mailing list