[R] Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

Rui Barradas ruipbarradas at sapo.pt
Fri Oct 5 14:53:38 CEST 2012


Hello,

Try length.out = max(length(loga), length(logb))

Note also that all of the previous code and the line above assumes that 
we are interested in the max distance, whereas the KS statistic computes 
the supremum of the distance. If it's a two sample test then their 
values are almost surely the same but not if it's a one sample test.

Hope this helps,

Rui Barradas
Em 05-10-2012 12:15, user1234 escreveu:
> Rui,
>
> Your response nearly answered a similar question of mine except that I also
> have ecdfs of different lengths.
>
> Do you know how I can adjust  x <- seq(min(loga, logb), max(loga, logb),
> length.out=length(loga))
> to account for this?  It must be in length.out() but I'm unsure how to
> proceed.
>
> Any advice is much appreciated.
>
> -L
>
>
> Rui Barradas wrote
>> Hello,
>>
>> Try the following.
>> (i've changed the color of the first ecdf.)
>>
>>
>> loga <- log10(a+1) # do this
>> logb <- log10(b+1) # only once
>>
>> f.a <- ecdf(loga)
>> f.b <- ecdf(logb)
>> # (2) max distance D
>>
>> x <- seq(min(loga, logb), max(loga, logb), length.out=length(loga))
>> x0 <- x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
>> y0 <- f.a(x0)
>> y1 <- f.b(x0)
>>
>> plot(f.a, verticals=TRUE, do.points=FALSE, col="blue")
>> plot(f.b, verticals=TRUE, do.points=FALSE, col="green", add=TRUE)
>> ## alternatine, use standard R plot of ecdf
>> #plot(f.a, col="blue")
>> #lines(f.b, col="green")
>>
>> points(c(x0, x0), c(y0, y1), pch=16, col="red")
>> segments(x0, y0, x0, y1, col="red", lty="dotted")
>> ## alternative, down to x axis
>> #segments(x0, 0, x0, y1, col="red", lty="dotted")
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>> maxbre wrote
>>> Hi all,
>>>
>>> given this example
>>>
>>> #start
>>>
>>> a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
>>>     
>>> 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
>>> length(a)
>>>
>>> b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
>>>       3220,490,20790,290,740,5350,940,3910,0,640,850,260)
>>> length(b)
>>>
>>> out<-ks.test(log10(a+1),log10(b+1))
>>>
>>> # max distance D
>>> out$statistic
>>>
>>> f.a<-ecdf(log10(a+1))
>>> f.b<-ecdf(log10(b+1))
>>>
>>> plot(f.a, verticals=TRUE, do.points=FALSE, col="red")
>>> plot(f.b, verticals=TRUE, do.points=FALSE, col="green", add=TRUE)
>>>
>>> #inverse of ecdf a
>>> x.a<-get("x", environment(f.a))
>>> y.a<-get("y", environment(f.a))
>>>
>>> # inverse of ecdf b
>>> x.b<-get("x", environment(f.b))
>>> y.b<-get("y", environment(f.b))
>>>
>>>
>>> #end
>>>
>>> I want to plot the max distance between the two ecdf curves as in the
>>> above given chart
>>>
>>> Is that possible and how?
>>>
>>>
>>> Thanks for your help
>>>
>>> PS: this is an amended version of a previous thread (but no reply
>>> followed) that I’ve deleted from Nabble repository because I realised it
>>> was not enough clear (now I hope it’s a little better, sorry for that)
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4645140.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list