[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"
Martin Maechler
maechler at stat.math.ethz.ch
Sat Jan 8 21:29:39 CET 2005
I've thought more and made experiements with R code versions
and just now committed a new version of p.adjust() to R-devel
--> https://svn.r-project.org/R/trunk/src/library/stats/R/p.adjust.R
which does sensible NA handling by default and
*additionally* has an "na.rm" argument (set to FALSE by
default). The extended 'Examples' secion on the help page
https://svn.r-project.org/R/trunk/src/library/stats/man/p.adjust.Rd
shows how the new NA handling is typically much more sensible
than using "na.rm = TRUE".
Martin
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Sat, 8 Jan 2005 17:19:23 +0100 writes:
>>>>> "GS" == Gordon K Smyth <smyth at wehi.edu.au>
>>>>> on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:
MM> <.............>
GS> p.adjust() unfortunately gives incorrect results when
GS> 'p' includes NAs. The results from topTable are
GS> correct. topTable() takes care to remove NAs before
GS> passing the values to p.adjust().
MM> There's at least one bug in p.adjust(): The "hommel"
MM> method currently does not work at all with NAs (and I
MM> have an uncommitted fix ready for this bug). OTOH, the
MM> current version of p.adjust() ``works'' with NA's, apart
MM> from Hommel's method, but by using "n = length(p)" in
MM> the correction formulae, i.e. *including* the NAs for
MM> determining sample size `n' {my fix to "hommel" would do
MM> this as well}.
MM> My question is what p.adjust() should do when there are
MM> NA's more generally, or more specifically which `n' to
MM> use in the correction formula. Your proposal amounts to
MM> ``drop NA's and forget about them till the very end''
MM> (where they are wanted in the result), i.e., your sample
MM> size `n' would be sum(!is.na(p)) instead of length(p).
MM> To me it doesn't seem obvious that this setting "n =
MM> #{non-NA observations}" is desirable for all P-value
MM> adjustment methods. One argument for keeping ``n = #{all
MM> observations}'' at least for some correction methods is
MM> the following "continuity" one:
MM> If only a few ``irrelevant'' (let's say > 0.5) P-values
MM> are replaced by NA, the adjusted relevant small P-values
MM> shouldn't change much, ideally not at all. I'm really
MM> no scholar on this topic, but e.g. for "holm" I think I
MM> would want to keep ``full n'' because of the above
MM> continuity argument. BTW, for "fdr", I don't see a
MM> straightforward way to achieve the desired continuity.
MM> 5D Of course, p.adjust() could adopt the possibility of
MM> chosing how NA's should be treated e.g. by another
MM> argument ``use.na = TRUE/FALSE'' and hence allow both
MM> versions.
MM> Feedback very welcome, particularly from ``P-value
MM> experts'' ;-)
MM> Martin Maechler, ETH Zurich
More information about the R-devel
mailing list