[BioC] straight t vs. bonferroni vs. all the new stuff.

Naomi Altman naomi at stat.psu.edu
Fri Oct 20 03:18:38 CEST 2006


I am trying to understand the issues better, too, but let me give this a try:

Firstly, I think that you must mean that n=n.tests=450,000.

Bonferroni and Holm guard against the probability of one or more 
errors none of the genes differentially express.

If that is what you want to guard against, then Holm is the method to 
use for the reason that Sean states.

Most of us would be happy if a large percentage of the genes that we 
declare to be differentially expressed, really are.  FDR is a set of 
methods that allow you to compute the expected percentage of mistakes 
you make if you reject at a certain level.  The way that I use it, is 
that I look at the q-values and the p-values.  If the percentage of 
differentially expressing genes is small, I set a q-value (i.e. an 
acceptable upper limit for FDR) and declare genes with p-value at the 
corresponding level or less to be significant.  If the percentage of 
differentially expressing genes is large, I set a p-value for 
significance, and report the corresponding FDR.

While estimating FDR using the Bioconductor routines, you will 
probably also estimate the percentage of genes that differentially 
express.  One thing to note is that to reject the number of 
hypotheses required to reach that estimated percentage, you will end 
up having an FDR that is much too high to be acceptable.  So,  once 
you set a cut-off, you are also almost certain to have a false 
non-detections as well.

Oh yes,  I forgot to mention that there is no universally good value 
to use for your cut-off.  If most of the genes are non-differentially 
expressing, most of your errors will be false detects.  If most of 
the genes are differentially expressing, most of your errors will be 
false non-detects.  So, there is no value that is good for every data set.

--Naomi

At 02:17 PM 10/19/2006, Sean Davis wrote:
>Matthew Lyon wrote:
> > Esteemed List:
> >
> > i need an alpha value for a t-test with about n=450,000 and a
> > 1) df of 2
> > 2) df of 4
> >
> > this is microarray data. i've been told bonferroni is too conservative for
> > microarrays, hence interesting approaches like multtest, the q-value
> > permuted one, etc...
> >
> > can anyone who deals in this area extensively (say, expression 
> data) give me
> > a ballpark value for t- or alpha- that's typically giving good 
> 'oh man this
> > is significantly different!' results ? i've got my own hunches but would
> > like some blinded numbers tossed at me too.
> >
>Look at the p.adjust() function if you already have p-values computed by
>a t-test as a place to start.  Bonferroni should probably never be used,
>as I think the Holm correction has the same assumptions but is less
>conservative (you get something for nothing...).  Some of the more
>stats-minded folks might be able to ellaborate on that particular point,
>but Holm is probably also too conservative.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list