[Rd] p.adjust(<NA>s), was 'Re: [BioC] limma and p-values'

Mon Jan 17 23:10:42 CET 2005

On Tue, January 18, 2005 7:45 am, Martin Maechler said:
>>>>>> "GS" == Gordon Smyth <smyth at wehi.edu.au>
>>>>>>     on Sun, 16 Jan 2005 19:44:26 +1100 writes:
>
>     GS> The new committed version of p.adjust() contains some
>     GS> problems:
>     >> p.adjust(c(0.05,0.5),method="hommel")
>     GS> [1] 0.05 0.50
>
>     GS> No adjustment!
>
> yes, but that's still better than what the current version of
> R 2.0.1 does, namely to give NA NA + two warnings ..

The R 2.0.1 version has some problems, no question, and needs to be fixed.  Thanks for giving time
to it.  Given a choice though between a wrong answer and no answer/warning/error, I think I'd
prefer the latter.

The problem with n=2 is easily fixed here because Hommel's method coincides with Hochberg's when n=2.

>     GS> I can't see how the new treatment of NAs can be
>     GS> justified. One needs to distinguish between NAs which
>     GS> represent missing p-values and NAs which represent
>     GS> unknown p-values. In virtually all applications giving
>     GS> rise to NAs, the NAs represent missing p-values which
>     GS> could not be computed because of missing data. In such
>     GS> cases, the observed p-values should definitely be
>     GS> adjusted as if the NAs weren't there, because NAs
>     GS> represent p-values which genuinely don't exist.
>
> hmm, "definitely" being a bit strong.  One could argue that
> ooonoe should use multiple imputation of the underlying missing
> data, or .. other scenarios.

Well, I'm sticking with "definitely" because it seems clear-cut.  The purpose of adjustment
methods is to maximise power while controling a chosen error rate (typically familywise error rate
FWER or false discovery rate FDR).  When the NAs represent missing p-values, it means that those
null hypotheses have zero probability of being rejected.  Hence the NA cases cannot add to FWER or
FDR.

Suppose you have p-values c(0.05,NA) corresponding to null hypotheses H1 and H2 and you want to
control the FWER at 0.05.  Then it is quite correct to reject H1 (and fail to reject H2).  If H2
is TRUE then the FWER is exactly 0.05.  If H2 is FALSE, then the FWER is lower.  Hence the FWER is
controlled at the desired level with no adjustment of the p-values.  Doing any adjustment can only
decrease power.

While imputation is a useful tool for making computations easier in some applications, I don't see
how any good argument could be made for imputation or similar in the context of p.adjust().
Imputing data that agrees with the null hypotheses is equivalent to ignoring the null hypotheses.
Imputing random data which rejects null hypothesis can only increase error rates.

Gordon

> I'll reply to your other, later, more detailed message
> separately and take the liberty to drop the other points here...
>
> Martin