[BioC] testing GO categories with Fisher's exact test.

Wed Feb 25 10:52:15 MET 2004

Hello,

The Chi-square test needs at least 5 expected genes, if this is true for your
study you can use it. However, the chi-square test is an approximation of the
fisher test, so you may want to use the fisher test directly. The chi-square
test is computitionally a lot more efficient than the fisher test - but
"these days" that's not an argument anymore ;-) .

	regards,

	Arne

> -----Original Message-----
> From: bioconductor-bounces+arne.muller=aventis.com at stat.math.ethz.ch
> [mailto:bioconductor-bounces+arne.muller=aventis.com at stat.math
> .ethz.ch]O
> n Behalf Of michael watson (IAH-C)
> Sent: 25 February 2004 10:07
> To: 'Nicholas Lewin-Koh'; bioconductor at stat.math.ethz.ch
> Cc: rdiaz at cnio.es
> Subject: RE: [BioC] testing GO categories with Fisher's exact test.
> 
> 
> Forgive my naivety, but could one not use a chi-squared test here?
> We have an observed amount of genes in each category, and 
> could calculate an expected from
> the size of the cluster and the distribution of all genes 
> throughout GO categories...
> 
> ?
> 
> -----Original Message-----
> From: Nicholas Lewin-Koh [mailto:nikko at hailmail.net]
> Sent: 24 February 2004 08:33
> To: bioconductor at stat.math.ethz.ch
> Cc: rdiaz at cnio.es
> Subject: [BioC] testing GO categories with Fisher's exact test.
> 
> 
> Hi all,
> I have a few questions about testing for over representation 
> of terms in
> a cluster.
> let's consider a simple case, a set of chips from an experiment say
> treated and untreted with 10,000
> genes on the chip and 1000 differentially expressed. Of the 
> 10000, 7000
> can be annotated and 6000 have
> a GO function assinged to them at a suitible level. Say for 
> this example
> there are 30 Go clasess that appear.
> I then conduct Fisher's exact test 30 times on each GO 
> category to detect
> differential representation of terms in the expressed
> set and correct for multiple testing.
> 
> My question is on the validity of this procedure. Just from experience
> many genes will
> have multiple functions assigned to them so the genes falling into GO
> classes are not independent.
> Also, there is the large set of un-annotated genes so we are in effect
> ignoring the influence of 
> all the unannotated genes on the outcome. Do people have any 
> thoughts or
> opinions on these approaches? It is
> appearing all over the place in bioinformatics tools like 
> FATIGO, EASE,
> DAVID etc. I find that 
> the formal testing approach makes me very uncomfortable, especially as
> the biologists I work with tend to over interpret the results.
> I am very interested to see the discussion on this topic.
> 
> Nicholas
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>