[BioC] globaltest mulitple testing correction
Claus-Dieter Mayer
claus at bioss.ac.uk
Mon Oct 20 10:36:55 CEST 2008
Hello Michael,
I think there are some things you are confusing here. It is correct that
SAM uses a permutation method to give q-values, i.e.estimates of the FDR
one would obtain when thresholding at the given value of the
test-statistic. This is SAM's specific way of using the permutations
though. In general a permutation test will give you a simple traditional
p-value for each gene, that has to be corrected for multiplicity just
like any other p-value. The main difference is that a permutation method
doesn't use a theoretical probability distribution to calculate, but
uses an empircial distribution obtained by resampling. For large
sample-sizes the two distributions and thus the two p-values obtained
from them will hardly differ, which is one way to see that calculating a
permutation p-value does not solve the multiple testing problem per se.
The same holds if you test many pathways/gene sets and obtain a p-value
for each. If for example you have 100 pathways and call all the ones
with p less than 5% significant you would expect 5 significant pathways
by chance, even if none of them is really changed, i.e. you have the
same old multiple testing problem. Possibly one could come up with SAM
like way of giving q-values for this situation (it is quite likely that
somebody has already come up with that idea too, others here might know
that better), but as far as I know the Globaltest package doesn't do
that, so they are absolutely correct in the paper and vignette about
this issue.
One thing to keep in mind is that adjusting p-values for gene set
analysis is not trivial as the gene sets are likely to overlap.
Hope that helps,
Claus
Michael Gormley wrote:
> In the paper and vignette describing the globaltest package, the
> authors mention the need for multiple testing when testing large
> numbers of pathways or functional gene groups. While I agree the
> number of statistical tests does need to be accounted for, I do not
> understand the need for additional multiple testing correction if the
> permutation method of calculating p-values is used. This method is
> used often to approximate the false discovery rate, most notably in
> the original implementation of Significance Analysis of Microarrays
> (SAM). Am I on track with my assessment here or is the additional
> multiple testing correction used as a more accurate way of obtaining
> the true FDR?
>
> Thanks,
> Michael Gormley
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
--
***********************************************************************************
Dr Claus-D. Mayer | http://www.bioss.ac.uk
Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
Rowett Research Institute | Telephone: +44 (0) 1224 716652
Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349
***********************************************************************************
Biomathematics and Statistics Scotland (BioSS) is formally part of The Scottish Crop Research Institute (SCRI), a registered Scottish charity No. SC006662
More information about the Bioconductor
mailing list