[BioC] testing GO categories with Fisher's exact test.

Tue Feb 24 13:03:10 MET 2004

> 
> 
> Surely, at the point where you are seeing "lots" of eg 
> apoptosis genes in your cluster, 
> drop the statistics and start the biology?  
>
> Remember the ultimate proof that of any statistical sense is 
> that it makes biological sense and is biologically validated. 
>  Do we really need to know if an annotation is significant??

Hm, I think it's a good start to know what is significant ... . On the other
hand I've to agree with you - there are often border line GO terms in my
datasets that are just not significant but fitting well into my hypothesis.
Especially for annotating a dataset via GO, one looks into a "biological
theme", and so it may be sensible to forget about everything that is
populated with less than say 5 genes or so ...

	kind regards,

	Arne

> -----Original Message-----
> From: Nicholas Lewin-Koh [mailto:nikko at hailmail.net]
> Sent: 24 February 2004 08:33
> To: bioconductor at stat.math.ethz.ch
> Cc: rdiaz at cnio.es
> Subject: [BioC] testing GO categories with Fisher's exact test.
> 
> 
> Hi all,
> I have a few questions about testing for over representation 
> of terms in
> a cluster.
> let's consider a simple case, a set of chips from an experiment say
> treated and untreted with 10,000
> genes on the chip and 1000 differentially expressed. Of the 
> 10000, 7000
> can be annotated and 6000 have
> a GO function assinged to them at a suitible level. Say for 
> this example
> there are 30 Go clasess that appear.
> I then conduct Fisher's exact test 30 times on each GO 
> category to detect
> differential representation of terms in the expressed
> set and correct for multiple testing.
> 
> My question is on the validity of this procedure. Just from experience
> many genes will
> have multiple functions assigned to them so the genes falling into GO
> classes are not independent.
> Also, there is the large set of un-annotated genes so we are in effect
> ignoring the influence of 
> all the unannotated genes on the outcome. Do people have any 
> thoughts or
> opinions on these approaches? It is
> appearing all over the place in bioinformatics tools like 
> FATIGO, EASE,
> DAVID etc. I find that 
> the formal testing approach makes me very uncomfortable, especially as
> the biologists I work with tend to over interpret the results.
> I am very interested to see the discussion on this topic.
> 
> Nicholas
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>