[BioC] Correct p-value in GSA (Gene set enrichment) permutation tests? [Scanned.]
Dirk Repsilber
repsilber at fbn-dummerstorf.de
Fri Sep 12 09:51:15 CEST 2008
Dear BioC Bioinformaticians,
I am using the package GSA for testing gene set enrichment in gene expression data.
GSA uses a permutation test for calculating p-values of enrichment.
Such p-values are usually defined as
p=(#(T* >= T)) / #B
where T is the test statistics originally observed, #B the number of permutations and
T* the test statistics observed for the permutated datasets.
However, function GSA implemented p=(#(T* > T)) / #B (as is also defined in the belonging article)
see article here:
http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf
As a consequence, even for really insufficient small designs (say comparison of two independent groups,
both of size 2) the resulting p-values contain a lot of cases with p=0.
In my experience this is often the case for about half of the pathways under consideration.
For larger designs this difference might not be that crucial, but for really small designs,
I think that, this p-value calculation delivers far too overoptimistic results
(too many "significant" pathways).
Is there a motivation for this unusual p-value calculation or should the lines in the GSA function
(original:)
pvalues.hi[i] = sum(r.star[i, ] > r.obs[i])/nperms
pvalues.lo[i] = sum(r.star[i, ] < r.obs[i])/nperms
read instead:
pvalues.hi[i] = sum(r.star[i, ] >= r.obs[i])/nperms
pvalues.lo[i] = sum(r.star[i, ] <= r.obs[i])/nperms
Would be grateful for any comments or clarifications!!
sincerely
Dirk.
--
_____________________________________________________
Dr. Dirk Repsilber
Biomathematics / Bioinformatics group
Genetics and Biometry
Research Institute for the Biology of Farm Animals
FBN
Wilhelm-Stahl-Allee 2
D-18196 Dummerstorf
Tel: +49 38208 68 916
Fax: +49 38208 68 902
www.fbn-dummerstorf.de/de/Forschung/FBs/fb2/repsilber
More information about the Bioconductor
mailing list