[BioC] new topGO results using GO.db very different from old ones using GO
Joern Toedling
toedling at ebi.ac.uk
Fri May 2 16:48:20 CEST 2008
Dear all,
I would appreciate any suggestion on the following issue. I have noticed
a major inconsistency between new and older topGO results. For the older
ones, topGO used the "GO" package, while it uses "GO.db" for the new
results I can't figure out whether it is a problem with topGO only or
whether there are some serious inconsistencies between GO and GO.db
Here is the source code I used:
library("topGO")
## load list of genes of interest
load("brainOnlyGenes.RData")
## load genereal gene-to-GO mapping and universe of genes to use in analysis:
load("mm9gene2GO.RData")
load("arrayGenesWithGO.RData")
## then the function to call topGO and to return a nice result table:
sigGOTable <- function(selGenes, GOgenes=arrayGenesWithGO,
gene2GO=mm9.gene2GO[arrayGenesWithGO], ontology="BP", maxP=0.001)
{
inGenes <- factor(as.integer(GOgenes %in% selGenes))
names(inGenes) <- GOgenes
GOdata <- new("topGOdata", ontology=ontology, allGenes=inGenes,
annot=annFUN.gene2GO, gene2GO=gene2GO)
myTestStat <- new("elimCount", testStatistic=GOFisherTest,
name="Fisher test", cutOff=maxP)
mySigGroups <- getSigGroups(GOdata, myTestStat)
sTab <- GenTable(GOdata, mySigGroups, topNodes=length(usedGO(GOdata)))
names(sTab)[length(sTab)] <- "p.value"
return(subset(sTab, as.numeric(p.value) < maxP))
}#
## call it:
(brainRes <- sigGOTable(brainOnlyGenes))
# with topGO_1.4.0 using GO_2.0.1
# this is:
# GO.ID Term Annotated Significant Expected p.value
# 1 GO:0007268 synaptic transmission 136 44 24.46 3.0e-05
# 2 GO:0007610 behavior 180 54 32.38 4.4e-05
# 3 GO:0007409 axonogenesis 119 38 21.41 0.00014
# 4 GO:0006887 exocytosis 40 17 7.20 0.00026
# 5 GO:0007420 brain development 136 40 24.46 0.00066
# which kind of make sense if it somehow to annotate a list of interesting genes when investigating brain cells
## now unfortunately using all the same gene list, universe and gene-to-GO mapping, and the same function as above
## with topGO_1.9.0 using GO.db_2.2.0, the result is:
# GO.ID Term Annotated Significant Expected p.value
# 1 GO:0007268 mitochondrial genome maintenance 137 44 24.65 3.7e-05
# 2 GO:0007610 reproduction 180 54 32.39 4.4e-05
# 3 GO:0007409 single strand break repair 119 38 21.41 0.00014
# 4 GO:0006887 regulation of DNA recombination 40 17 7.20 0.00026
# 5 GO:0007420 regulation of mitotic recombination 136 40 24.47 0.00066
# which is obviously very, very different
Does anyone have an educated guess what is going on? Could it be a bug a
in topGO? Or is the information in GO.db really different from the one
in GO, and in that case which one is the right one?
Best regards,
Joern
More information about the Bioconductor
mailing list