[BioC] Gene enrichment question

Alex Gutteridge alexg at ruggedtextile.com
Thu Aug 16 10:49:34 CEST 2012


On 16.08.2012 00:57, Aliaksei Holik wrote:
> Dear all,
>
> Thank you for your answers and suggestions. Rather than replying to
> each of you, I'm just going to summarise.
>
> I've looked into hypergeometric test suggested by Alex, but
> admittedly couldn't get my head round the jargon. So I focused on
> other methods suggested. However, Alex's suggestion to remove 'stem
> cell genes' not present in my dataset was most helpful, as indeed 
> some
> 56 genes were missing from my array.
>
> I haven't tried going into GSEA as suggested by Steve, but I do have
> expression values for the genes and will try to have a go at it 
> later.
>
> I have then looked into Fisher's exact test and bootstrapping
> suggested by Michael and Steve. I couldn't figure out, how to use
> 'boot' function to get it to sample a limited size sample (86) from a
> given population (17119), so I ended up doing it "manually". In fact,
> I tried both permutation and bootstrapping, by using sampling without
> and with replacement, but couldn't see much difference in
> distribution. Any comments on bootstrapping vs permutation and the
> ideal number of replications are much appreciated.

Just FYI, the hypergeometric test is equivalent to Fisher's exact (in 
the case of a 2x2 contingency table), so the P value (before 
bootstrapping) should be identical 
(http://en.wikipedia.org/wiki/Hypergeometric_distribution%23Relationship_to_Fisher.27s_exact_test).

I don't know your exact research question, but I would probably argue 
that stats has taken you as far as you need in this case - the Fisher 
test shows you that yes you do indeed have (slightly) more overlapping 
genes than you would expect by chance. Other methods are unlikely to 
change that fundamental conclusion. The real question is what are those 
overlapping genes and do they have any real relevance for the biology 
you are studying (omics derived gene lists are often filled with stuff 
of tangential interest to the question in hand in my experience!). If I 
understand right then given there are only 9 of these genes in your set, 
this can probably be done fairly quickly by hand via Pubmed and/or a 
friendly local expert.

-- 
Alex Gutteridge



More information about the Bioconductor mailing list