# [R] P-values Kolmogorov–Smirnov test

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Sep 5 20:46:42 CEST 2019

```Hello,

I'm sorry, but apparently I missed the point of your problem.
Please do not take my previous answer seriously.

But you can use ks.test, just in a different way than what I wrote
previously.

Corrected code:

#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <-m_2[order(m_2\$d_1),]
d_2 <- m_2\$d_1
p_2 <- m_2\$p_1

#weighted ecdf for the reference distribution and the sample
f_d_1 <- ewcdf(d_1, normalise=F)
f_d_2 <- ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))

#kolmogorov-smirnov distance
x <- f_d_1(d_2)
y <- f_d_2(d_2)
ht <- ks.test(x, y)
d_stat[i, 2] <- ht\$statistic
d_stat[i, 3] <- ht\$p.value
}

Hope this helps,

Às 19:29 de 05/09/19, Rui Barradas escreveu:
> Hello,
>
> I don't have the algorithms at hand but the KS statistic calculation is
> more complicated than your max/abs difference.
>
> Anyway, why not use ks.test? it's not that difficult:
>
>
> set.seed(1234)
> #reference distribution
> d_1 <- sort(rpois(1000, 500))
> p_1 <- d_1/sum(d_1)
> m_1 <- data.frame(d_1, p_1)
>
> #data frame to store the values of the simulation
> d_stat <- data.frame(1:1000, NA, NA)
> names(d_stat) <- c("sample_size", "ks_distance", "p_value")
>
> #simulation
> for (i in 1:1000) {
>    #sample from the reference distribution
>    m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>    d_2 <- m_2\$d_1
>
>    ht <- ks.test(d_1, d_2)
>    #kolmogorov-smirnov distance
>    d_stat[i, 2] <- ht\$statistic
>    d_stat[i, 3] <- ht\$p.value
> }
>
> hist(d_stat[, 2])
> hist(d_stat[, 3])
>
>
> Note that d_2 is not sorted, but the results are equal in the sense of
> function identical(), meaning they are *exactly* the same. Why shouldn't
> they?
>
> Hope this helps,
>
>
>
> Às 17:06 de 05/09/19, Boo G. escreveu:
>> Hello,
>>
>> I am trying to perform a Kolmogorov–Smirnov test to assess the
>> difference between a distribution and samples drawn proportionally to
>> size of different sizes. I managed to compute the Kolmogorov–Smirnov
>> distance but I am lost with the p-value. I have looked into the
>> ks.test function unsuccessfully. Can anyone help me with computing
>> p-values for a two-tailed test?
>>
>> Below a simplified version of my code.
>>
>> Thanks in advance.
>> Gianluca
>>
>>
>> library(spatstat)
>>
>> #reference distribution
>> d_1 <- sort(rpois(1000, 500))
>> p_1 <- d_1/sum(d_1)
>> m_1 <- data.frame(d_1, p_1)
>>
>> #data frame to store the values of the siumation
>> d_stat <- data.frame(1:1000, NA, NA)
>> names(d_stat) <- c("sample_size", "ks_distance", "p_value")
>>
>> #simulation
>> for (i in 1:1000) {
>>    #sample from the reference distribution
>>    m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>    m_2 <-m_2[order(m_2\$d_1),]
>>    d_2 <- m_2\$d_1
>>    p_2 <- m_2\$p_1
>>
>>    #weighted ecdf for the reference distribution and the sample
>>    f_d_1 <- ewcdf(d_1, normalise=F)
>>    f_d_2 <- ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>
>>    #kolmogorov-smirnov distance
>>    d_stat[i,2] <- max(abs(f_d_1(d_2) - f_d_2(d_2)))
>> }
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

```