[BioC] GOStats: can't find probesets, which are in overrepresented GO term

Mike Walter michael_walter at email.de
Fri Oct 1 14:07:19 CEST 2010


Dear BioC List,

I have difficulties in understanding and retracing the results from GO overrepresentation analysis. 
I used GOTerms to see if there are any GO Terms overrepresented in a list of DE genes with following code:


#define gene universe, select only probes with GO annotation and  
#transform probesetIDs in unique Entrez Ids
>
> input <- featureNames(d.norm.filt) 
> univ <- mget(input, mouse4302GO, ifnotfound=NA) 
> univ <- input[!is.na(univ)]
> univ <- mget(univ, mouse4302ENTREZID, ifnotfound=NA)
> univ <- unique(as.character(univ[!is.na(univ)]))

#do the same for my list of genes (DE)
>
> ID <- mget(DE, mouse4302GO, ifnotfound=NA)
> ID <- DE[!is.na(ID)]
> ID <- mget(as.character(ID), mouse4302ENTREZID, ifnotfound=NA)
> ID <- unique(as.character(ID[!is.na(ID)]))

#run HyperGTest
>
> BP = new("GOHyperGParams", 
+    geneIds=ID, universeGeneIds=univ,
+    annotation="mouse4302", ontology="BP",
+    pvalueCutoff=0.01, conditional=TRUE,
+    testDirection="over")
> hGt.BP = hyperGTest(BP)
> head(summary(hGt.BP))

 GOBPID       Pvalue OddsRatio     ExpCount Count Size
1 GO:0006944 0.0001406440  146.7895 0.0180309101     2   21
2 GO:0014049 0.0008586148       Inf 0.0008586148     1    1
3 GO:0032303 0.0008586148       Inf 0.0008586148     1    1
4 GO:0032308 0.0008586148       Inf 0.0008586148     1    1
5 GO:0034405 0.0008586148       Inf 0.0008586148     1    1
6 GO:0043132 0.0008586148       Inf 0.0008586148     1    1
 Term
1                       cellular membrane fusion
2     positive regulation of glutamate secretion
3              regulation of icosanoid secretion
4 positive regulation of prostaglandin secretion
5                 response to fluid shear stress
6                                  NAD transport

When I now want to know, which probesets of my DE list are in the first GO, I tried following chunk. 


> go = as.matrix(unlist(mget("GO:0006944", mouse4302GO2PROBE, ifnotfound=NA)))
> DEinGO = DE[is.element(DE,go[,1])]
> DEinGO
character(0)


However, this results in an empty vector. I then picked all probesets, which fall into this category, 
which results in only 5 probesets/2 genes. 

> revgo = mouse4302GO2PROBE
> revgo = as.list(revgo[mappedkeys(revgo)])
> revgo = lapply(revgo,as.vector)
> probesInGO = as.vector(unlist(revgo["GO:0006944"))
> mget(probesInGO, mouse4302ENTREZID)
$`1417349_at`
[1] "18457"

$`1417350_at`
[1] "18457"

$`1457088_at`
[1] "18457"

$`1420833_at`
[1] "22318"

$`1420834_at`
[1] "22318"

None of the 5 probesets are in my DE list. So I have basically 2 questions: 

1) Why is this GO overrepresented in my list? Or, where is the large error in my script, that results in this mistake?

2) Why is the size of the GO category in the summary 21, when there are only 5 probesets with this GO?

Thank you very much for any ideas,

Mike



More information about the Bioconductor mailing list