[BioC] HyperGTest Gene Universe problem

Seth Falcon sfalcon at fhcrc.org
Mon May 7 20:11:06 CEST 2007


Hi Vivek,

"Vivek Kaimal" <Vivek.Kaimal at cchmc.org> writes:
> Hi Seth.
>
> I am using the hyperGTest function for some genesets I have and I'm
> having some problems with the Gene Universe & Gene set being used in the
> analysis. My original Gene Universe contains 18382 genes and one of my
> gene sets contains 597 genes. 
>> length(GeneUniverse)
> [1] 18382
>> length(GeneList)
> [1] 597
>
> Then I run the following to test for over-representation:
>> hgCutoff<-0.05
>> params <- new("GOHyperGParams", geneIds = GeneList, universeGeneIds =
> GeneUniverse,annotation = "hgu133plus2",ontology = "BP", pvalueCutoff =
> hgCutoff, conditional = FALSE,testDirection = "over")
>> hgOver <- hyperGTest(params)
>
> But when I check the details for "hgOver", the number of genes used for
> Gene Universe and Gene set seem to be much lower than my original sets.
> The summary is as given below:
>
>> hgOver
> Gene to GO BP  test for over-representation 
> 1101 GO BP ids tested (160 have p < 0.05)
> Selected gene set size: 427 
>     Gene universe size: 11292 
>     Annotation package: hgu133plus2 
>
> Is it because some of my Entrez IDs are not being found in the
> annotation package? Do I need to use another annotation package?

Unfortunately, the documentation is a bit too spread out to be as
useful as I would like.  If you read the doc for hyperGTest in the
Category package (sorry, not in GOstats), then you will see:

     Both the selected genes and the universe are reduced by removing
     identifiers that do not have any annotations in the specified
     category.

And so in your case, it means there are gene IDs in selected and
universe that have no GO BP annotation and they have been removed.  We
made this choice because inflating the gene universe with IDs that
cannot appear in any of the categories will, in general, result in
more impressive, but less meaningful, p-value for the over-represented
terms.

+ seth


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioconductor mailing list