[Rd] Inconsistencies in wilcox.test
Karolis Koncevičius
k@ro||@@koncev|c|u@ @end|ng |rom gm@||@com
Sun Dec 15 13:57:39 CET 2019
I see I am too late to comment :)
But commenting after the fact, just wish to say that I like the changes.
Specially the mentioning of "exact" in the test name.
Floating point prevision is very nicely implemented too.
My only worry is that it will not serve new/lay users that may be in the
biggest need for protections like these.
Do you think it would make sense to do it a bit differently? i.e.
setting digits.rank=7 by default, and including a message in the warning
i.e. "ties present, if you are working with small digits consider
adjusting digits.rank".
But, on the other hand, I understand that this would be a breaking
change. A non breaking change might be to leave digits.rank as NA or
NULL by default, which would act as infinity but also would do a test
within wilcox.test() that checks for ties with digits.rank=7. Then a
warning will say "possibly missed ties due to machine precision, if you
are sure these are not ties - set digits.rank to Inf to get rid of this
warning". This would be a non-breaking change, except for a warning.
Would be interesting to hear your thoughts about this.
I will pull your changes and try to play with the code a bit later
today. Thanks a lot for, Martin!
Also I have an unrelated question - I mainly find these discrepancies in
"stats" because I am working on my little package related to hypothesis
tests. And I have found a few more of them in other tests. One that I
reported long time ago, regarding flinger.test(), which is also related
to machine precision.
In terms of the etiquette of this list - should I mention them in this
same thread or is it better to create a new one?
Kind regards,
Karolis K.
On 2019-12-14 22:50, Martin Maechler wrote:
>>>>>> Martin Maechler
>>>>>> on Thu, 12 Dec 2019 17:20:47 +0100 writes:
>
>>>>>> Karolis Koncevičius
>>>>>> on Mon, 9 Dec 2019 23:43:36 +0200 writes:
>
> >> So I tried adding Infinity support for all cases. And it
> >> is (as could be expected) more complicated than I
> >> thought.
>
> > "Of course !" Thank you, Karolis, in any case!
>
> >> It is easy to add Inf support for the test. The problems
> >> start with conf.int=TRUE.
>
> >> Currently confidence intervals are computed via
> >> `uniroot()` and, in the case of infinities, we are
> >> computationally looking for roots over infinite interval
> >> which results in an error. I suspect this is the reason
> >> Inf values were removed in the first place.
>
> > Maybe. It's still wrong to be done "up front". I'm sure
> > 98% (or so ;-) of all calls to wilcox.test() do *not* use
> > conf.int = TRUE
>
>
> >> Just a note, I found a few more errors/inconsistencies
> >> when requesting confidence intervals with paired=TRUE
> >> (due to Infinities being left in).
>
> >> Current error in Inf-Inf scenario:
>
> >> wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE,
> >> conf.int=TRUE) Error in if (ZEROES) x <- x[x != 0] :
> >> missing value where TRUE/FALSE needed
>
> > Good catch .. notably as it also happens when
> > conf.int=FALSE as by default. My version of wilcox.test()
> > now does give the same as when the to 'Inf' are left away.
>
> >> NaN confidence intervals:
>
> >> wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE,
> >> conf.int=TRUE)
>
> >> Wilcoxon signed rank test with continuity correction
>
> >> data: c(1:9, Inf) and c(21:28, Inf, 30) V = 9.5, p-value
> >> = 0.0586 alternative hypothesis: true location shift is
> >> not equal to 0 0 percent confidence interval: NaN NaN
> >> sample estimates: midrange NaN
>
> > I don't see a big problem here. The NaN's in some sense
> > show the best that can be computed with this data.
> > Statistic and P-value, but no conf.int.
>
>
> >> The easiest "fix" for consistency would be to simply
> >> remove Infinity support from the paired=TRUE case.
>
> > I strongly disagree. We are not pruning good
> > functionality just for some definition of consistency.
>
> >> But going with the more desirable approach of adding
> >> Infinity support for non-paired cases - it is currently
> >> not clear to me what confidence intervals and
> >> pseudomedian should be. Specially when Infinities are on
> >> both sides.
>
> > I deem that not to be a big deal. They are not defined
> > given the default formulas and that is reflected by NA /
> > NaN in those parts of the result.
>
> >> Regards, Karolis Koncevičius.
>
> > But I have also spent a few hours now on
> > wilcox.test.default() behavior notably also looking at the
> > "rounding" / "machine precision" situation, and also on
> > your remark that the 'method: ...' does not indicate well
> > enough what was computed.
>
> > In my (not yet committed) but hereby proposed enhancement
> > of wilcox.test(), I have a new argument, 'digits.rank =
> > Inf' (the default 'Inf' corresponding to the current
> > behavior) with help page documentation:
>
> > digits.rank: a number; if finite, ‘rank(signif(r,
> > digits.rank))’ will be used to compute ranks for the test
> > statistic instead of (the default) ‘rank(r)’.
>
> > and then in 'Details :'
>
> > For stability reasons, it may be advisable to use
> > rounded data or to set ‘digits.rank = 7’, say, such that
> > determination of ties does not depend on very small
> > numeric differences (see the example).
>
> > and then in 'Examples: '
>
> > ## accuracy in ties determination via 'digits.rank':
> > wilcox.test( 4:2, 3:1, paired=TRUE) # Warning: cannot
> > compute exact p-value with ties wilcox.test((4:2)/10,
> > (3:1)/10, paired=TRUE) # no ties => *no* warning
> > wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank =
> > 9) # same ties as (4:2, 3:1)
>
> > ----------------------
>
> > Lastly, I propose to replace "test" by "exact test" in the
> > 'method' component (and print out) in case exact
> > computations were used. This information should be part
> > of the returned "htest" object, and not only visible from
> > the arguments and warnings that are printed during the
> > computations. This last change is in some sense the "most
> > back-incompatible" change of these, because many
> > wilcox.test() printouts would slightly change, e.g.,
>
> >> w0 <- wilcox.test( 1:5, 4*(0:4), paired=TRUE)
>
> > Wilcoxon signed rank exact test
>
> > data: 1:5 and 4 * (0:4) V = 1, p-value = 0.125
> > alternative hypothesis: true location shift is not equal
> > to 0
>
> > where before (in R <= 3.6.x) it is just
>
> > Wilcoxon signed rank test
>
> > data: ......... ............... ...............
>
> > but I think R 4.0.0 is a good occasion for such a change.
>
> > Constructive feedback on all this is very welcome! Martin
>
>... none ... I "assume" this means everybody likes the idea ;-)
>
>Anyway, now comitted to R-devel (for R 4.0.0), svn rev 77569
>(in 'NEW FEATURES').
>
>Martin
>
>
>
> >> On 2019-12-07 23:18, Karolis Koncevičius wrote:
> >>> Thank you for a fast response. Nice to see this mailing
> >>> list being so alive.
> >>>
> >>> Regarding Inf issue: I agree with your assessment that
> >>> Inf should not be removed. The code gave me an
> >>> impression that Inf values were intentionally removed
> >>> (since is.finite() was used everywhere, except for
> >>> paired case). I will try to adjust my patch according to
> >>> your feedback.
> >>>
> >>> One more thing: it seems like you assumed that issues
> >>> 2:4 are all related to machine precision, which is not
> >>> the case - only 2nd issue is. Just wanted to draw this
> >>> to your attention, in case you might have some feedback
> >>> and guidelines about issues 3 and 4 as well.
> >>>
> >>>
> >>>
> >>> On 2019-12-07 21:59, Martin Maechler wrote:
> >>>>>>>>> Karolis Koncevičius on Sat, 7 Dec 2019 20:55:36
> >>>>>>>>> +0200 writes:
> >>>>
> >>>> > Hello, > Writing to share some things I've found
> >>>> about wilcox.test() that seem a > a bit inconsistent.
> >>>>
> >>>> > 1. Inf values are not removed if paired=TRUE
> >>>>
> >>>> > # returns different results (Inf is removed): >
> >>>> wilcox.test(c(1,2,3,4), c(0,9,8,7)) >
> >>>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
> >>>>
> >>>> > # returns the same result (Inf is left as value with
> >>>> highest rank): > wilcox.test(c(1,2,3,4), c(0,9,8,7),
> >>>> paired=TRUE) > wilcox.test(c(1,2,3,4), c(0,9,8,Inf),
> >>>> paired=TRUE)
> >>>>
> >>>> Now which of the two cases do you consider correct ?
> >>>>
> >>>> IHMO, the 2nd one is correct: it is exactly one
> >>>> property of the *robustness* of the wilcoxon test and
> >>>> very desirable that any (positive) outlier is treated
> >>>> the same as just "the largest value" and the test
> >>>> statistic (and hence the p-value) should not change.
> >>>>
> >>>> So I think the first case is wrong, notably if
> >>>> modified, (such that the last y is the overall maximal
> >>>> value (slightly larger sample):
> >>>>
> >>>>> wilcox.test(1:7, 1/8+ c(9:4, 12))
> >>>>
> >>>> Wilcoxon rank sum test
> >>>>
> >>>> data: 1:7 and 1/8 + c(9:4, 12) W = 6, p-value = 0.01748
> >>>> alternative hypothesis: true location shift is not
> >>>> equal to 0
> >>>>
> >>>>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
> >>>>
> >>>> Wilcoxon rank sum test
> >>>>
> >>>> data: 1:7 and 1/8 + c(9:4, 10000) W = 6, p-value =
> >>>> 0.01748 alternative hypothesis: true location shift is
> >>>> not equal to 0
> >>>>
> >>>>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
> >>>>
> >>>> Wilcoxon rank sum test
> >>>>
> >>>> data: 1:7 and 1/8 + c(9:4, Inf) W = 6, p-value =
> >>>> 0.03497 alternative hypothesis: true location shift is
> >>>> not equal to 0
> >>>>
> >>>> The Inf case should definitely give the same as the
> >>>> 10'000 case. That's exactly one property of a robust
> >>>> statistic.
> >>>>
> >>>> Thank you, Karolis, this is pretty embarrassing to only
> >>>> be detected now after 25+ years of R in use ...
> >>>>
> >>>> The correct fix starts with replacing the is.finite()
> >>>> by !is.na() and keep the 'Inf' in the rank
> >>>> computations... (but then probably also deal with the
> >>>> case of more than one Inf, notably the Inf - Inf
> >>>> "exception" which is not triggered by your example...)
> >>>>
> >>>>
> >>>> ---
> >>>>
> >>>> Ben addressed the "rounding" / numerical issues
> >>>> unavoidable for the other problems.
> >>>>
> >>>> > 2. tolerance issues with paired=TRUE.
> >>>>
> >>>> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE) > #
> >>>> ... > # Warning: cannot compute exact p-value with
> >>>> ties
> >>>>
> >>>> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1),
> >>>> paired=TRUE) > # ... > # no warning
> >>>>
> >>>> > 3. Always 'x observations are missing' when
> >>>> paired=TRUE
> >>>>
> >>>> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_),
> >>>> paired=TRUE) > # ... > # Error: not enough (finite)
> >>>> 'x' observations
> >>>>
> >>>> > 4. No indication if normal approximation was used:
> >>>>
> >>>> > # different numbers, but same "method" name >
> >>>> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE) >
> >>>> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
> >>>>
> >>>>
> >>>> > From all of these I am pretty sure the 1st one is
> >>>> likely unintended, > so attaching a small patch to
> >>>> adjust it. Can also try patching others if > consensus
> >>>> is reached that the behavioiur has to be modified.
> >>>>
> >>>> > Kind regards, > Karolis Koncevičius.
> >>>>
More information about the R-devel
mailing list