[BioC] Bug in hyperGTest for KEGGHyperGParams?

Jenny Drnevich drnevich at illinois.edu
Tue Jul 5 18:13:07 CEST 2011


Hi all,

I'm doing both GO and KEGG over-representation testing on several 
different lists of genes, using the same background set for each 
list. What's got me puzzled is the difference in the "Gene universe 
size" reported from the hyperGTest results for each list from the 
KEGG test, even though they have the same background set. When I make 
a GOHyperGParams object for each list and test them, the results 
report the same "Gene universe size" for each list, which I assume to 
be the number of genes in the background that have any GO MF terms. 
However, for the KEGG test, each list reports a different "Gene 
universe size", so I'm unsure how selecting a different list from the 
same background can change the mapping of the background to KEGG 
terms. I haven't been able to get into the exact code of calling 
hyperGTest on a KEGGHyperGParams object, so I don't know what is 
going on - is it a bug? Or for KEGG terms, is this supposed to 
happen? Reproducible example and sessionInfo() below.

Thanks,
Jenny

 > library(annaffy)
Loading required package: Biobase

Welcome to Bioconductor

   Vignettes contain introductory material. To view, type
   'browseVignettes()'. To cite Bioconductor, see
   'citation("Biobase")' and for packages 'citation("pkgname")'.

Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: DBI

Loading required package: KEGG.db

 > library(porcine.db)
Loading required package: org.Ss.eg.db


 > library(GOstats)
Loading required package: Category
Loading required package: graph
 >
 >
 > all.ids <- Rkeys(porcineENTREZID)
 > length(all.ids)
[1] 30160
 >
 >
 > set.seed(1234)
 > list1 <- sample(all.ids,5000)
 > list2 <- list1[1:1000]
 > list3 <- list1[4501:5000]
 >
 > par.MF.list <- list(list1 = new("GOHyperGParams", geneIds = list1, 
universeGeneIds = all.ids,ontology="MF",
+                         annotation="porcine.db", 
testDirection="over", pvalueCutoff=0.01,conditional=F),
+                     list2 = new("GOHyperGParams", geneIds = list2, 
universeGeneIds = all.ids,ontology="MF",
+                         annotation="porcine.db", 
testDirection="over", pvalueCutoff=0.01,conditional=F) ,
+                     list3 = new("GOHyperGParams", geneIds = list3, 
universeGeneIds = all.ids,ontology="MF",
+                         annotation="porcine.db", 
testDirection="over", pvalueCutoff=0.01,conditional=F))
 >
 > hg.MF.list <- lapply(par.MF.list,hyperGTest)
 > hg.MF.list
$list1
Gene to GO MF  test for over-representation
1007 GO MF ids tested (1 have p < 0.01)
Selected gene set size: 569
     Gene universe size: 3198
     Annotation package: porcine

$list2
Gene to GO MF  test for over-representation
419 GO MF ids tested (6 have p < 0.01)
Selected gene set size: 106
     Gene universe size: 3198
     Annotation package: porcine

$list3
Gene to GO MF  test for over-representation
266 GO MF ids tested (2 have p < 0.01)
Selected gene set size: 63
     Gene universe size: 3198
     Annotation package: porcine

#Note the Gene universe size is 3198 for all 3 lists

 >
 >
 > par.KEGG <- list(list1 = new("KEGGHyperGParams", geneIds = list1, 
universeGeneIds = all.ids,
+                     annotation="porcine.db", testDirection="over", 
pvalueCutoff=0.01),
+                 list2= new("KEGGHyperGParams", geneIds = list2, 
universeGeneIds = all.ids,
+                     annotation="porcine.db", testDirection="over", 
pvalueCutoff=0.01) ,
+                 list3= new("KEGGHyperGParams", geneIds = list3, 
universeGeneIds = all.ids,
+                     annotation="porcine.db", testDirection="over", 
pvalueCutoff=0.01) )
 >
 > hg.KEGG <- lapply(par.KEGG,hyperGTest)
 > hg.KEGG
$list1
Gene to KEGG  test for over-representation
190 KEGG ids tested (3 have p < 0.01)
Selected gene set size: 280
     Gene universe size: 1629
     Annotation package: porcine

$list2
Gene to KEGG  test for over-representation
105 KEGG ids tested (1 have p < 0.01)
Selected gene set size: 54
     Gene universe size: 1363
     Annotation package: porcine

$list3
Gene to KEGG  test for over-representation
87 KEGG ids tested (1 have p < 0.01)
Selected gene set size: 30
     Gene universe size: 1204
     Annotation package: porcine

# Now there are 3 different Gene universe sizes: 1629, 1363 and 1204. WHY?

 >
 >
 > sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United 
States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
  [1] 
GOstats_2.18.0       graph_1.30.0         Category_2.18.0 
porcine.db_2.4.7     org.Ss.eg.db_2.5.0   annaffy_1.24.0
  [7] 
KEGG.db_2.5.0        GO.db_2.5.0          RSQLite_0.9-4 
DBI_0.2-5            AnnotationDbi_1.14.1 Biobase_2.12.1

loaded via a namespace (and not attached):
[1] annotate_1.30.0   genefilter_1.34.0 
GSEABase_1.14.0   RBGL_1.28.0       splines_2.13.0    survival_2.36-5 
   tools_2.13.0
[8] XML_3.4-0.2       xtable_1.5-6



More information about the Bioconductor mailing list