[BioC] straight t vs. bonferroni vs. all the new stuff.
Naomi Altman
naomi at stat.psu.edu
Fri Oct 20 03:18:38 CEST 2006
I am trying to understand the issues better, too, but let me give this a try:
Firstly, I think that you must mean that n=n.tests=450,000.
Bonferroni and Holm guard against the probability of one or more
errors none of the genes differentially express.
If that is what you want to guard against, then Holm is the method to
use for the reason that Sean states.
Most of us would be happy if a large percentage of the genes that we
declare to be differentially expressed, really are. FDR is a set of
methods that allow you to compute the expected percentage of mistakes
you make if you reject at a certain level. The way that I use it, is
that I look at the q-values and the p-values. If the percentage of
differentially expressing genes is small, I set a q-value (i.e. an
acceptable upper limit for FDR) and declare genes with p-value at the
corresponding level or less to be significant. If the percentage of
differentially expressing genes is large, I set a p-value for
significance, and report the corresponding FDR.
While estimating FDR using the Bioconductor routines, you will
probably also estimate the percentage of genes that differentially
express. One thing to note is that to reject the number of
hypotheses required to reach that estimated percentage, you will end
up having an FDR that is much too high to be acceptable. So, once
you set a cut-off, you are also almost certain to have a false
non-detections as well.
Oh yes, I forgot to mention that there is no universally good value
to use for your cut-off. If most of the genes are non-differentially
expressing, most of your errors will be false detects. If most of
the genes are differentially expressing, most of your errors will be
false non-detects. So, there is no value that is good for every data set.
--Naomi
At 02:17 PM 10/19/2006, Sean Davis wrote:
>Matthew Lyon wrote:
> > Esteemed List:
> >
> > i need an alpha value for a t-test with about n=450,000 and a
> > 1) df of 2
> > 2) df of 4
> >
> > this is microarray data. i've been told bonferroni is too conservative for
> > microarrays, hence interesting approaches like multtest, the q-value
> > permuted one, etc...
> >
> > can anyone who deals in this area extensively (say, expression
> data) give me
> > a ballpark value for t- or alpha- that's typically giving good
> 'oh man this
> > is significantly different!' results ? i've got my own hunches but would
> > like some blinded numbers tossed at me too.
> >
>Look at the p.adjust() function if you already have p-values computed by
>a t-test as a place to start. Bonferroni should probably never be used,
>as I think the Holm correction has the same assumptions but is less
>conservative (you get something for nothing...). Some of the more
>stats-minded folks might be able to ellaborate on that particular point,
>but Holm is probably also too conservative.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list