[BioC] GOStat and multiple testing

A.J. Rossini rossini at blindglobe.net
Thu Aug 5 21:01:01 CEST 2004


Correct.  And your work continues to confirm the second issue in
general, which is nice.

But it's the first that is particularly nasty to create a reasonable
solution for.  I'd really like to see one!

best,
-tony


Jeremy Gollub <jgollub at genome.stanford.edu> writes:

> That's certainly true.  We decided to be pragmatic.
>
> If I've understood the problem correctly, there are two major problems
> with determining the significance of a GO annotation.  First is the lack
> of independence in the DAG (directed acyclic graph) structure.
> Bootstrapping won't fix that.  Second, though, is the problem that, for a
> small group of test genes at least, any GO term that comes up at all will
> appear ridiculously significant when using a hypergeometric test.  What we
> found is that FDR calculations seem to deal with this second issue better
> than a FWER correction.
>
> --
> Jeremy Gollub, Ph.D.
> jgollub at genome.stanford.edu
> (W) 650/736-0075
>
> On Thu, 5 Aug 2004, A.J. Rossini wrote:
>
>> 
>> It (FDR by bootstrapping) doesn't solve the basic problem with lack of
>> independence, which makes it useful but wrong, or just wrong,
>> depending on how pragmatic you want to be.
>> 
>> 
>> Jeremy Gollub <jgollub at genome.stanford.edu> writes:
>> 
>> > Correcting p-values for multiple hypothesis testing in GO analysis is a
>> > hard problem conceptually.  I'm not aware of any general solution.
>> >
>> > In a recently-published set of Perl modules for GO term analysis,
>> >
>> > 	http://bioinformatics.oupjournals.org/cgi/content/abstract/bth456v1
>> >
>> > we support False Discovery Rate calculations (based on permutations of
>> > results) as a substitute.  It's probably not perfect, but according to our
>> > simulations it's better than either uncorrected p-values or a simple
>> > correction (e.g., Bonferroni).
>> >
>> > Our software uses a hypergeometric test on a list of selected genes.
>> > Another approach would be to calculate a p-value (e.g., by Cox
>> > regression) for all genes on a microarray, and test the significance of
>> > each GO term using Fisher meta-analysis.  (I'm sure I've seen a
>> > refererence to that approach, but can't recall it now.)
>> >
>> > --
>> > Jeremy Gollub, Ph.D.
>> > jgollub at genome.stanford.edu
>> > (W) 650/736-0075
>> >
>> > On Thu, 5 Aug 2004, Robert Gentleman wrote:
>> >
>> >> On Wed, Aug 04, 2004 at 01:06:30PM +0200, Arne.Muller at aventis.com wrote:
>> >> > Hello,
>> >> > 
>> >> > I was wondering if one needs to correct the p-values from the hypergeometirx test from GOstat for mutliple testing, since one performs many tests (over all GO categories found in the gene list). I'm not sure if correction for multiple testing makse sense since the GO terms are highly dependent (terms on the same branch + one gene is annotated in several terms).
>> >> > 
>> >> > Robert Gentleman mentiones in the GOstats documentation that the multiple testing issue is not solved yet? I assume GOHyperG does not perform any kind of multiple testing correction, is this right?
>> >> 
>> >> Hi,
>> >>   it does not, and I am unaware of any general solution to the
>> >>   problem of adjusting p-values here. The structure of GO is such that
>> >>   there are issues due to lack of independence. There are some other
>> >>   problems, but I have not had time to write up my ideas yet.
>> >>   I have to say that I am also not so convinced that this is
>> >>   the best way to do things (classifying genes as interesting or not,
>> >>   and then doing the hypergeometric test), although I have yet to come
>> >>   up with a better way. I agree with those that have suggested that
>> >>   this is best used as a rough guide to interesting categories (others
>> >>   projects seem have different opinions, and I think some do use some
>> >>   sort of p-value correction). 
>> >> 
>> >>   Robert
>> >> 
>> >> > 
>> >> > I'd be happy to receive comments on this and to heare about your experience.
>> >> > 
>> >> > 	kind regards,
>> >> > 
>> >> > 	Arne
>> >> > 
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at stat.math.ethz.ch
>> >> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> >> 
>> >> -- 
>> >> +---------------------------------------------------------------------------+
>> >> | Robert Gentleman                 phone : (617) 632-5250                   |
>> >> | Associate Professor              fax:   (617)  632-2444                   |
>> >> | Department of Biostatistics      office: M1B20                            |
>> >> | Harvard School of Public Health  email: rgentlem at jimmy.harvard.edu        |
>> >> +---------------------------------------------------------------------------+
>> >> 
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> >>
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> >
>> 
>> -- 
>> Anthony Rossini			    Research Associate Professor
>> rossini at u.washington.edu            http://www.analytics.washington.edu/ 
>> Biomedical and Health Informatics   University of Washington
>> Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
>> UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
>> FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
>> 
>> CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be
>> confidential and privileged. If you received this message in error,
>> please destroy it and notify the sender. Thank you.
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>

-- 
Anthony Rossini			    Research Associate Professor
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}



More information about the Bioconductor mailing list