[BioC] Bug in hyperGTest for KEGGHyperGParams?
Jenny Drnevich
drnevich at illinois.edu
Tue Jul 5 18:13:07 CEST 2011
Hi all,
I'm doing both GO and KEGG over-representation testing on several
different lists of genes, using the same background set for each
list. What's got me puzzled is the difference in the "Gene universe
size" reported from the hyperGTest results for each list from the
KEGG test, even though they have the same background set. When I make
a GOHyperGParams object for each list and test them, the results
report the same "Gene universe size" for each list, which I assume to
be the number of genes in the background that have any GO MF terms.
However, for the KEGG test, each list reports a different "Gene
universe size", so I'm unsure how selecting a different list from the
same background can change the mapping of the background to KEGG
terms. I haven't been able to get into the exact code of calling
hyperGTest on a KEGGHyperGParams object, so I don't know what is
going on - is it a bug? Or for KEGG terms, is this supposed to
happen? Reproducible example and sessionInfo() below.
Thanks,
Jenny
> library(annaffy)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.
Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: DBI
Loading required package: KEGG.db
> library(porcine.db)
Loading required package: org.Ss.eg.db
> library(GOstats)
Loading required package: Category
Loading required package: graph
>
>
> all.ids <- Rkeys(porcineENTREZID)
> length(all.ids)
[1] 30160
>
>
> set.seed(1234)
> list1 <- sample(all.ids,5000)
> list2 <- list1[1:1000]
> list3 <- list1[4501:5000]
>
> par.MF.list <- list(list1 = new("GOHyperGParams", geneIds = list1,
universeGeneIds = all.ids,ontology="MF",
+ annotation="porcine.db",
testDirection="over", pvalueCutoff=0.01,conditional=F),
+ list2 = new("GOHyperGParams", geneIds = list2,
universeGeneIds = all.ids,ontology="MF",
+ annotation="porcine.db",
testDirection="over", pvalueCutoff=0.01,conditional=F) ,
+ list3 = new("GOHyperGParams", geneIds = list3,
universeGeneIds = all.ids,ontology="MF",
+ annotation="porcine.db",
testDirection="over", pvalueCutoff=0.01,conditional=F))
>
> hg.MF.list <- lapply(par.MF.list,hyperGTest)
> hg.MF.list
$list1
Gene to GO MF test for over-representation
1007 GO MF ids tested (1 have p < 0.01)
Selected gene set size: 569
Gene universe size: 3198
Annotation package: porcine
$list2
Gene to GO MF test for over-representation
419 GO MF ids tested (6 have p < 0.01)
Selected gene set size: 106
Gene universe size: 3198
Annotation package: porcine
$list3
Gene to GO MF test for over-representation
266 GO MF ids tested (2 have p < 0.01)
Selected gene set size: 63
Gene universe size: 3198
Annotation package: porcine
#Note the Gene universe size is 3198 for all 3 lists
>
>
> par.KEGG <- list(list1 = new("KEGGHyperGParams", geneIds = list1,
universeGeneIds = all.ids,
+ annotation="porcine.db", testDirection="over",
pvalueCutoff=0.01),
+ list2= new("KEGGHyperGParams", geneIds = list2,
universeGeneIds = all.ids,
+ annotation="porcine.db", testDirection="over",
pvalueCutoff=0.01) ,
+ list3= new("KEGGHyperGParams", geneIds = list3,
universeGeneIds = all.ids,
+ annotation="porcine.db", testDirection="over",
pvalueCutoff=0.01) )
>
> hg.KEGG <- lapply(par.KEGG,hyperGTest)
> hg.KEGG
$list1
Gene to KEGG test for over-representation
190 KEGG ids tested (3 have p < 0.01)
Selected gene set size: 280
Gene universe size: 1629
Annotation package: porcine
$list2
Gene to KEGG test for over-representation
105 KEGG ids tested (1 have p < 0.01)
Selected gene set size: 54
Gene universe size: 1363
Annotation package: porcine
$list3
Gene to KEGG test for over-representation
87 KEGG ids tested (1 have p < 0.01)
Selected gene set size: 30
Gene universe size: 1204
Annotation package: porcine
# Now there are 3 different Gene universe sizes: 1629, 1363 and 1204. WHY?
>
>
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United
States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1]
GOstats_2.18.0 graph_1.30.0 Category_2.18.0
porcine.db_2.4.7 org.Ss.eg.db_2.5.0 annaffy_1.24.0
[7]
KEGG.db_2.5.0 GO.db_2.5.0 RSQLite_0.9-4
DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.1
loaded via a namespace (and not attached):
[1] annotate_1.30.0 genefilter_1.34.0
GSEABase_1.14.0 RBGL_1.28.0 splines_2.13.0 survival_2.36-5
tools_2.13.0
[8] XML_3.4-0.2 xtable_1.5-6
More information about the Bioconductor
mailing list