[BioC] Gene enrichment question
Alex Gutteridge
alexg at ruggedtextile.com
Thu Aug 16 10:49:34 CEST 2012
On 16.08.2012 00:57, Aliaksei Holik wrote:
> Dear all,
>
> Thank you for your answers and suggestions. Rather than replying to
> each of you, I'm just going to summarise.
>
> I've looked into hypergeometric test suggested by Alex, but
> admittedly couldn't get my head round the jargon. So I focused on
> other methods suggested. However, Alex's suggestion to remove 'stem
> cell genes' not present in my dataset was most helpful, as indeed
> some
> 56 genes were missing from my array.
>
> I haven't tried going into GSEA as suggested by Steve, but I do have
> expression values for the genes and will try to have a go at it
> later.
>
> I have then looked into Fisher's exact test and bootstrapping
> suggested by Michael and Steve. I couldn't figure out, how to use
> 'boot' function to get it to sample a limited size sample (86) from a
> given population (17119), so I ended up doing it "manually". In fact,
> I tried both permutation and bootstrapping, by using sampling without
> and with replacement, but couldn't see much difference in
> distribution. Any comments on bootstrapping vs permutation and the
> ideal number of replications are much appreciated.
Just FYI, the hypergeometric test is equivalent to Fisher's exact (in
the case of a 2x2 contingency table), so the P value (before
bootstrapping) should be identical
(http://en.wikipedia.org/wiki/Hypergeometric_distribution%23Relationship_to_Fisher.27s_exact_test).
I don't know your exact research question, but I would probably argue
that stats has taken you as far as you need in this case - the Fisher
test shows you that yes you do indeed have (slightly) more overlapping
genes than you would expect by chance. Other methods are unlikely to
change that fundamental conclusion. The real question is what are those
overlapping genes and do they have any real relevance for the biology
you are studying (omics derived gene lists are often filled with stuff
of tangential interest to the question in hand in my experience!). If I
understand right then given there are only 9 of these genes in your set,
this can probably be done fairly quickly by hand via Pubmed and/or a
friendly local expert.
--
Alex Gutteridge
More information about the Bioconductor
mailing list