[Rd] Confusion about ks.test() handling of ties and exact vs approximate results

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Apr 21 14:56:59 CEST 2023


>>>>> Karolis Koncevičius 
>>>>>     on Fri, 21 Apr 2023 11:32:41 +0300 writes:

    > Hello,

    > Today I was investigating ks.test() with two numerical arguments (x and y) and was left a bit confused about the policy behind handling ties.
    > I might be missing something, so sorry in advance, but here is what confuses me:

    > The documentation states: "The presence of ties always generates a warning, since continuous distributions do not generate them"

Indeed, that has not correct anymore for quite a while I think.

The current default is  `exact = NULL`  and that will be made
into TRUE in certain circumstances, notably for all(*) small data
situations.
--
*) The help page gives details.


    > But when I run a test with ties there is no warning:

    > ks.test(1:4, 4:7)

and indeed the printed output explicitly says that the *exact*
test was used.

    > However, when I specify that I do not want an exact test, there appears a warning saying that the computation will be approximate:

    > ks.test(1:4, 4:7, exact=FALSE)
    > # Warning: p-value will be approximate in the presence of ties

    > But isn’t specifying exact=FALSE already makes the test approximate?

yes, but I think the idea is you'd look twice, and see that in
this case it is recommended to also use  simulate.p.value = TRUE,


    > I tried inspecting the source code for guidance but also was left a bit puzzled. In ks.test.R under if(is.numeric(y)) clause there is a variable called TIES that is set and changed, but is never used anywhere. Here are examples:

    > line 55    TIES <- FALSE

    > line 61    TIES <- TRUE

    > line 74    if (TIES)
    > line 75        z <- w

    > But later this z variable is not used as a variable in the code. It looks to me that this TIES variable can be deleted without affecting anything else.

That is correct.  It is indeed a remainder from before the
recent improvements and psmirnov().

[TIES is used in the other branch in the same ks.test.default() function]



More information about the R-devel mailing list