[BioC] testing GO categories with Fisher's exact test.
Arne.Muller at aventis.com
Arne.Muller at aventis.com
Tue Feb 24 13:03:10 MET 2004
>
>
> Surely, at the point where you are seeing "lots" of eg
> apoptosis genes in your cluster,
> drop the statistics and start the biology?
>
> Remember the ultimate proof that of any statistical sense is
> that it makes biological sense and is biologically validated.
> Do we really need to know if an annotation is significant??
Hm, I think it's a good start to know what is significant ... . On the other
hand I've to agree with you - there are often border line GO terms in my
datasets that are just not significant but fitting well into my hypothesis.
Especially for annotating a dataset via GO, one looks into a "biological
theme", and so it may be sensible to forget about everything that is
populated with less than say 5 genes or so ...
kind regards,
Arne
> -----Original Message-----
> From: Nicholas Lewin-Koh [mailto:nikko at hailmail.net]
> Sent: 24 February 2004 08:33
> To: bioconductor at stat.math.ethz.ch
> Cc: rdiaz at cnio.es
> Subject: [BioC] testing GO categories with Fisher's exact test.
>
>
> Hi all,
> I have a few questions about testing for over representation
> of terms in
> a cluster.
> let's consider a simple case, a set of chips from an experiment say
> treated and untreted with 10,000
> genes on the chip and 1000 differentially expressed. Of the
> 10000, 7000
> can be annotated and 6000 have
> a GO function assinged to them at a suitible level. Say for
> this example
> there are 30 Go clasess that appear.
> I then conduct Fisher's exact test 30 times on each GO
> category to detect
> differential representation of terms in the expressed
> set and correct for multiple testing.
>
> My question is on the validity of this procedure. Just from experience
> many genes will
> have multiple functions assigned to them so the genes falling into GO
> classes are not independent.
> Also, there is the large set of un-annotated genes so we are in effect
> ignoring the influence of
> all the unannotated genes on the outcome. Do people have any
> thoughts or
> opinions on these approaches? It is
> appearing all over the place in bioinformatics tools like
> FATIGO, EASE,
> DAVID etc. I find that
> the formal testing approach makes me very uncomfortable, especially as
> the biologists I work with tend to over interpret the results.
> I am very interested to see the discussion on this topic.
>
> Nicholas
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list