[BioC] bug? universeMappedCount for KEGGHyperG tests in GOstats

Cei Abreu-Goodger cei at ebi.ac.uk
Tue Oct 6 11:50:59 CEST 2009

Hi all,

There seems to be a problem with the universeMappedCount function (and 
maybe the underlying statistics following from this?) for an hyperGTest 
on a KEGGHyperGParams. It appears to be reporting the total number of 
mapped genes in the _tested_ categories instead of the total number of 
mapped genes in the initial universe. This may sound intentional, but 
its inconsistent with what happens when using a GOHyperGParams.

Example code follows with its output and sessionInfo at the end:


# Define a fixed universe for KEGG and GO tests
universeKEGG <- sample(mappedkeys(org.Mm.egPATH),1000)
universeGO   <- sample(mappedkeys(org.Mm.egGO),1000)

# Perform GO/KEGG hyperG tests with different sample sizes
for (size in c(5,10,20)) {
    genesKEGG  <- sample(universeKEGG,size)
    genesGO    <- sample(universeGO,size)

    paramsKEGG <- new("KEGGHyperGParams", geneIds=genesKEGG, 
               annotation="org.Mm.eg", pvalueCutoff=0.05,
    paramsGO <- new("GOHyperGParams", geneIds=genesGO, 
               annotation="org.Mm.eg", pvalueCutoff=0.05, ontology="MF",

    resultsKEGG <- hyperGTest(paramsKEGG)
    resultsGO   <- hyperGTest(paramsGO)

    uniSizeKEGG <- universeMappedCount(resultsKEGG)
    uniSizeGO   <- universeMappedCount(resultsGO)

    print(paste("Sample size:",size,", GO mapped universe:",uniSizeGO,", 
KEGG mapped universe:",uniSizeKEGG))


## Code output:

[1] "Sample size: 5 , GO mapped universe: 884 , KEGG mapped universe: 286"
[1] "Sample size: 10 , GO mapped universe: 884 , KEGG mapped universe: 402"
[1] "Sample size: 20 , GO mapped universe: 884 , KEGG mapped universe: 569"

## The GO mapped universe stays constant but KEGG counts increase with 
sample sizes.

R version 2.9.2 (2009-08-24)


attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] GO.db_2.2.11        org.Mm.eg.db_2.2.11 GOstats_2.10.0
[4] RSQLite_0.7-1       DBI_0.2-4           graph_1.22.2
[7] Category_2.10.1     AnnotationDbi_1.6.1 Biobase_2.4.1

loaded via a namespace (and not attached):
[1] annotate_1.22.0   genefilter_1.24.2 GSEABase_1.6.1    RBGL_1.20.0
[5] splines_2.9.2     survival_2.35-4   tools_2.9.2       XML_2.5-3
[9] xtable_1.5-5

More information about the Bioconductor mailing list