[BioC] Gene enrichment question
Aliaksei Holik
salvador at bio.bsu.by
Wed Aug 15 15:51:51 CEST 2012
Dear listers,
Apologies if my question is not strictly related to Bioconductor, though
one never knows, maybe there's a package that does what I need.
I am analysing a list of differentially expressed genes from an Illumina
microarray. In particular I'm trying to compare the list of
differentially expressed genes to an existing list of genes
preferentially expressed in the stem cell population (stem cell
signature). When I do so, 10% of DE genes belong to the stem cell
signature. What I'd like to do now is to find out, how likely that would
happen by chance, i.e. put a p value on it.
At the moment I know:
There're 17119 unique genes in my dataset.
Of them 86 are differentially expressed.
The stem cell signature contains 510 genes.
It is combined from several platforms, which makes it hard to establish
the total number of unique genes, but it's at least 20819 (the size of
the largest platform).
There are 9 overlapping genes between DE genes and the stem cell signature.
So I wonder:
1) If there's an accepted way to calculate a p value using these data.
For instance could I run a like of a chi squared test? E.g. stem cell
specific genes represent 510/20819=2.4% of total dataset. So expected
number of the stem cell genes in my DE genes would be 86x2.4%=2. So my
chi squared test would be based on 9 observed vs 2 expected.
2) Or do I have to generate a geneset based on the stem cell signature
and go through GSEA algorithms to calculate enrichment and significance.
Any pointers in the right direction would be much appreciated.
Many thanks for your time and help!
Aliaksei.
More information about the Bioconductor
mailing list