Dear all,

I'm analyzing a set of genes for GO enrichment, given a customized 
annotation. In that annotation, a specific GO term occurs only once; in 
my set of genes, the gene attached to it is also represented.

I would have expected to find the numbers -- genes having a specific GO 
term in my list being "1" and genes having the term in total, in the 
annotation, being "1" as well -- in "count" and "size", however, 13 and 
62 genes are denoted. Please have a look at my script and the session info:



SessionInfo()

R version 3.0.2 (2013-09-25)

Platform: x86_64-redhat-linux-gnu (64-bit)

locale:

[1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C

[3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8

[5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=de_DE.utf8

[7] LC_PAPER=de_DE.utf8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C

attached base packages:

[1] parallel stats graphics grDevices utils datasets methods

[8] base

other attached packages:

[1] GO.db_2.9.0 GSEABase_1.22.0 annotate_1.38.0

[4] GOstats_2.26.0 RSQLite_0.11.4 DBI_0.2-7

[7] graph_1.38.3 Category_2.26.0 AnnotationDbi_1.22.6

[10] Biobase_2.20.1 BiocGenerics_0.6.0

loaded via a namespace (and not attached):

[1] AnnotationForge_1.2.2 genefilter_1.42.0 IRanges_1.18.4

[4] RBGL_1.36.2 splines_3.0.2 stats4_3.0.2

[7] survival_2.37-7 tcltk_3.0.2 tools_3.0.2

[10] XML_3.98-1.1 xtable_1.7-1


## Script

library(GOstats)

library(GSEABase)

library(GO.db)


ANNO_FILE <- "genes_anno_forgostats.txt"

UNIVERSE_FILE <-"universe.txt"

ORG <- "MY_Organism"

GENE_FILE<-"targets.txt"

OUT_FILE<-"results.txt"


goframeData <- read.table(ANNO_FILE,header=T)

universe <- scan(file=UNIVERSE_FILE,what="character")

goFrame = GOFrame(goframeData, organism = ORG)

goAllFrame = GOAllFrame(goFrame)

gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())


genes <- scan(file=GENE_FILE,what="character")


NAMESPACE="MF"

x = length(NAMESPACE)

params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, 
geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, 
pvalueCutoff = 0.05, conditional=TRUE, testDirection="over")

OverMF <- hyperGTest(params)


NAMESPACE="BP"

x = length(NAMESPACE)

params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, 
geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, 
pvalueCutoff = 0.05, conditional=TRUE, testDirection="over")

OverBP <- hyperGTest(params)


NAMESPACE="CC"

x = length(NAMESPACE)

params <- GSEAGOHyperGParams(name="custom",geneSetCollection = gsc, 
geneIds = genes, universeGeneIds = universe, ontology = NAMESPACE, 
pvalueCutoff = 0.05, conditional=TRUE, testDirection="over")

OverCC <- hyperGTest(params)


write.table(summary(OverMF),file=OUTFILE,quote=F,sep='\t', append=T)

write.table(summary(OverBP),file=OUTFILE,quote=F,sep='\t', append=T)

write.table(summary(OverCC),file=OUTFILE,quote=F,sep='\t', append=T)




If I run GOStats unconditionally, other GO terms being parent terms to 
the one I'm looking at also pop up. The numbers (count, size) are the 
same for all terms; my assumption is that genes with child terms 
belonging to a specific parent term are summed up in count/size, but I'm 
absolutely not sure if this is the case... Could you please help me 
figuring out what is denoted by the numbers in "counts" and "size", if 
not the absolute numbers like I would have expected?


Kind regards, and many thanks in advance,

Christine Gläßer


	[[alternative HTML version deleted]]

