[BioC] topGO enrichment using ensembl gene list
James W. MacDonald
jmacdon at med.umich.edu
Mon Mar 31 19:12:34 CEST 2008
Hi Julien,
Julien Roux wrote:
> Hello list,
>
> I am using the package "topGO" to analyse GO enrichment of gene sets:
>
> My genes are ensembl IDs and are not taken from a microarray, so I had
> to feed "topGOdata" with a gene2GO list.
> (see
> http://thread.gmane.org/gmane.science.biology.informatics.conductor/14627)
> I construct that list by mapping all ensembl IDs to GO IDs using the
> package "biomaRt".
> Then I proceed with my analysis:
>
> > GOdata <- new("topGOdata", ontology = "MF", allGenes = selectedList,
> description = "Ensembl GO enrichment", annot = annFUN.gene2GO, gene2GO =
> gene2GO)
>
> Do you confirm this approach is correct?
It should be correct. You simply need a named character vector where the
names are the Entrez Gene IDs, and the vector contains GO IDs.
>
> I also had several question concerning topGO:
> - Are the p-value in topGO corrected for multiple testing (FDR...)? My
> guess is that they are not due to a problem of independence...
I don't think they are corrected. I'm not even sure you could (or
should). As with a lot of microarray analyses, p-values should not be
taken at face value. Rather they should be used more as ranking tools.
> - Are there some differences between Fisher exact test (topGO) and
> Hypergeometric test (GOstats). If yes, why did the two packages make
> different choices?
Both packages are using the same test. The Fisher exact test is used to
assess association between variables in a 2x2 contingency table. Under
the null hypothesis of independence the counts in a given table follow a
hypergeometric distribution, so the p-values for a 2x2 table are
computed using this distribution. See e.g., ?fisher.test
> - It is not clear to me what the Kolmogorov-Smirnov is testing?
> Especially in my case where I don't provide scores associated to my genes...
> - Is there a way to test separately over/under representation of GO
> categories?
In GOstats there is. I don't know about topGO.
Best,
Jim
>
> Thanks a lot in advance for your help or tips
> Julien
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list