[R] p.adjust(p, n) for n>length(p)

Varma, Sudhir (NIH/NIAID) [C] varmas at niaid.nih.gov
Wed Mar 18 22:43:43 CET 2009


Hi all,
I am having a problem with the function "p.adjust" in stats. I have looked at the manuals and searched the R site, but didn't get anything that seems directly relevant. Can anybody throw any light on it or confirm my suspicion that this might be a bug?

I am trying to use the p.adjust() function to do Benjamini/Hochberg FDR control on a vector of p-values that are the smallest K p-values selected from an N-length vector. So I am using p.adjust(p, n=N, method="BH")

This seems to give smaller adjusted p-values than if I use p.adjust(p, method="BH"). For example (K=10, N=20)

> p=runif(10)
> p
 [1] 0.4690849 0.5108430 0.8703507 0.1950010 0.9938629 0.8582079 0.7987490 0.9312799 0.9195361 0.7814532
> p.adjust(p, n=20, method="BH")
 [1] 0.7818081 0.7859123 0.9802947 0.3545472 0.9938629 0.9802947 0.9802947 0.9802947 0.9802947 0.9802947
> p.adjust(p, method="BH")
 [1] 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629

This seemed unusual, since if p is being selected from a larger vector, one would expect the FDR to increase. Then I figured out that what is happening is that the p.adjust function is working as if p is being padded with zeros to make it N-length. For example
> p.adjust(c(p, rep(0,10)), method="BH")
 [1] 0.7818081 0.7859123 0.9802947 0.3545472 0.9938629 0.9802947 0.9802947 0.9802947 0.9802947 0.9802947 0.0000000
[12] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000

Which gives the same adjusted p-values as when I had used n=20 in the function. This looks contrary to what the manual for the function actually says: 
"Note that you can set 'n' larger than 'length(p)' which means the unobserved p-values are assumed to be greater than all the observed p for '"bonferroni"' and '"holm"' methods and equal to 1 for the other methods."

Instead of assuming un-observed p-values to be equal to 1, the function seems to assume un-observed p-values to be 0. 

I am using R version 2.7.2 on Windows XP Professional. sessionInfo() data pasted at end of email.

Regards
Sudhir Varma

R version 2.7.2 (2008-08-25) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.4.1 RCurl_0.9-3    Biobase_2.0.1  mvtnorm_0.9-2 




More information about the R-help mailing list