[Rd] Inconsistencies in wilcox.test
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Sat Dec 14 22:50:07 CET 2019
>>>>> Martin Maechler
>>>>> on Thu, 12 Dec 2019 17:20:47 +0100 writes:
>>>>> Karolis Koncevičius
>>>>> on Mon, 9 Dec 2019 23:43:36 +0200 writes:
>> So I tried adding Infinity support for all cases. And it
>> is (as could be expected) more complicated than I
>> thought.
> "Of course !" Thank you, Karolis, in any case!
>> It is easy to add Inf support for the test. The problems
>> start with conf.int=TRUE.
>> Currently confidence intervals are computed via
>> `uniroot()` and, in the case of infinities, we are
>> computationally looking for roots over infinite interval
>> which results in an error. I suspect this is the reason
>> Inf values were removed in the first place.
> Maybe. It's still wrong to be done "up front". I'm sure
> 98% (or so ;-) of all calls to wilcox.test() do *not* use
> conf.int = TRUE
>> Just a note, I found a few more errors/inconsistencies
>> when requesting confidence intervals with paired=TRUE
>> (due to Infinities being left in).
>> Current error in Inf-Inf scenario:
>> wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE,
>> conf.int=TRUE) Error in if (ZEROES) x <- x[x != 0] :
>> missing value where TRUE/FALSE needed
> Good catch .. notably as it also happens when
> conf.int=FALSE as by default. My version of wilcox.test()
> now does give the same as when the to 'Inf' are left away.
>> NaN confidence intervals:
>> wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE,
>> conf.int=TRUE)
>> Wilcoxon signed rank test with continuity correction
>> data: c(1:9, Inf) and c(21:28, Inf, 30) V = 9.5, p-value
>> = 0.0586 alternative hypothesis: true location shift is
>> not equal to 0 0 percent confidence interval: NaN NaN
>> sample estimates: midrange NaN
> I don't see a big problem here. The NaN's in some sense
> show the best that can be computed with this data.
> Statistic and P-value, but no conf.int.
>> The easiest "fix" for consistency would be to simply
>> remove Infinity support from the paired=TRUE case.
> I strongly disagree. We are not pruning good
> functionality just for some definition of consistency.
>> But going with the more desirable approach of adding
>> Infinity support for non-paired cases - it is currently
>> not clear to me what confidence intervals and
>> pseudomedian should be. Specially when Infinities are on
>> both sides.
> I deem that not to be a big deal. They are not defined
> given the default formulas and that is reflected by NA /
> NaN in those parts of the result.
>> Regards, Karolis Koncevičius.
> But I have also spent a few hours now on
> wilcox.test.default() behavior notably also looking at the
> "rounding" / "machine precision" situation, and also on
> your remark that the 'method: ...' does not indicate well
> enough what was computed.
> In my (not yet committed) but hereby proposed enhancement
> of wilcox.test(), I have a new argument, 'digits.rank =
> Inf' (the default 'Inf' corresponding to the current
> behavior) with help page documentation:
> digits.rank: a number; if finite, ‘rank(signif(r,
> digits.rank))’ will be used to compute ranks for the test
> statistic instead of (the default) ‘rank(r)’.
> and then in 'Details :'
> For stability reasons, it may be advisable to use
> rounded data or to set ‘digits.rank = 7’, say, such that
> determination of ties does not depend on very small
> numeric differences (see the example).
> and then in 'Examples: '
> ## accuracy in ties determination via 'digits.rank':
> wilcox.test( 4:2, 3:1, paired=TRUE) # Warning: cannot
> compute exact p-value with ties wilcox.test((4:2)/10,
> (3:1)/10, paired=TRUE) # no ties => *no* warning
> wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank =
> 9) # same ties as (4:2, 3:1)
> ----------------------
> Lastly, I propose to replace "test" by "exact test" in the
> 'method' component (and print out) in case exact
> computations were used. This information should be part
> of the returned "htest" object, and not only visible from
> the arguments and warnings that are printed during the
> computations. This last change is in some sense the "most
> back-incompatible" change of these, because many
> wilcox.test() printouts would slightly change, e.g.,
>> w0 <- wilcox.test( 1:5, 4*(0:4), paired=TRUE)
> Wilcoxon signed rank exact test
> data: 1:5 and 4 * (0:4) V = 1, p-value = 0.125
> alternative hypothesis: true location shift is not equal
> to 0
> where before (in R <= 3.6.x) it is just
> Wilcoxon signed rank test
> data: ......... ............... ...............
> but I think R 4.0.0 is a good occasion for such a change.
> Constructive feedback on all this is very welcome! Martin
... none ... I "assume" this means everybody likes the idea ;-)
Anyway, now comitted to R-devel (for R 4.0.0), svn rev 77569
(in 'NEW FEATURES').
Martin
>> On 2019-12-07 23:18, Karolis Koncevičius wrote:
>>> Thank you for a fast response. Nice to see this mailing
>>> list being so alive.
>>>
>>> Regarding Inf issue: I agree with your assessment that
>>> Inf should not be removed. The code gave me an
>>> impression that Inf values were intentionally removed
>>> (since is.finite() was used everywhere, except for
>>> paired case). I will try to adjust my patch according to
>>> your feedback.
>>>
>>> One more thing: it seems like you assumed that issues
>>> 2:4 are all related to machine precision, which is not
>>> the case - only 2nd issue is. Just wanted to draw this
>>> to your attention, in case you might have some feedback
>>> and guidelines about issues 3 and 4 as well.
>>>
>>>
>>>
>>> On 2019-12-07 21:59, Martin Maechler wrote:
>>>>>>>>> Karolis Koncevičius on Sat, 7 Dec 2019 20:55:36
>>>>>>>>> +0200 writes:
>>>>
>>>> > Hello, > Writing to share some things I've found
>>>> about wilcox.test() that seem a > a bit inconsistent.
>>>>
>>>> > 1. Inf values are not removed if paired=TRUE
>>>>
>>>> > # returns different results (Inf is removed): >
>>>> wilcox.test(c(1,2,3,4), c(0,9,8,7)) >
>>>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>>>>
>>>> > # returns the same result (Inf is left as value with
>>>> highest rank): > wilcox.test(c(1,2,3,4), c(0,9,8,7),
>>>> paired=TRUE) > wilcox.test(c(1,2,3,4), c(0,9,8,Inf),
>>>> paired=TRUE)
>>>>
>>>> Now which of the two cases do you consider correct ?
>>>>
>>>> IHMO, the 2nd one is correct: it is exactly one
>>>> property of the *robustness* of the wilcoxon test and
>>>> very desirable that any (positive) outlier is treated
>>>> the same as just "the largest value" and the test
>>>> statistic (and hence the p-value) should not change.
>>>>
>>>> So I think the first case is wrong, notably if
>>>> modified, (such that the last y is the overall maximal
>>>> value (slightly larger sample):
>>>>
>>>>> wilcox.test(1:7, 1/8+ c(9:4, 12))
>>>>
>>>> Wilcoxon rank sum test
>>>>
>>>> data: 1:7 and 1/8 + c(9:4, 12) W = 6, p-value = 0.01748
>>>> alternative hypothesis: true location shift is not
>>>> equal to 0
>>>>
>>>>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
>>>>
>>>> Wilcoxon rank sum test
>>>>
>>>> data: 1:7 and 1/8 + c(9:4, 10000) W = 6, p-value =
>>>> 0.01748 alternative hypothesis: true location shift is
>>>> not equal to 0
>>>>
>>>>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
>>>>
>>>> Wilcoxon rank sum test
>>>>
>>>> data: 1:7 and 1/8 + c(9:4, Inf) W = 6, p-value =
>>>> 0.03497 alternative hypothesis: true location shift is
>>>> not equal to 0
>>>>
>>>> The Inf case should definitely give the same as the
>>>> 10'000 case. That's exactly one property of a robust
>>>> statistic.
>>>>
>>>> Thank you, Karolis, this is pretty embarrassing to only
>>>> be detected now after 25+ years of R in use ...
>>>>
>>>> The correct fix starts with replacing the is.finite()
>>>> by !is.na() and keep the 'Inf' in the rank
>>>> computations... (but then probably also deal with the
>>>> case of more than one Inf, notably the Inf - Inf
>>>> "exception" which is not triggered by your example...)
>>>>
>>>>
>>>> ---
>>>>
>>>> Ben addressed the "rounding" / numerical issues
>>>> unavoidable for the other problems.
>>>>
>>>> > 2. tolerance issues with paired=TRUE.
>>>>
>>>> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE) > #
>>>> ... > # Warning: cannot compute exact p-value with
>>>> ties
>>>>
>>>> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1),
>>>> paired=TRUE) > # ... > # no warning
>>>>
>>>> > 3. Always 'x observations are missing' when
>>>> paired=TRUE
>>>>
>>>> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_),
>>>> paired=TRUE) > # ... > # Error: not enough (finite)
>>>> 'x' observations
>>>>
>>>> > 4. No indication if normal approximation was used:
>>>>
>>>> > # different numbers, but same "method" name >
>>>> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE) >
>>>> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>>>>
>>>>
>>>> > From all of these I am pretty sure the 1st one is
>>>> likely unintended, > so attaching a small patch to
>>>> adjust it. Can also try patching others if > consensus
>>>> is reached that the behavioiur has to be modified.
>>>>
>>>> > Kind regards, > Karolis Koncevičius.
>>>>
More information about the R-devel
mailing list