[R] Problems with ks.test()
Greg Snow
Greg.Snow at imail.org
Sat Jul 30 20:19:37 CEST 2011
What makes you think that the p-value of 1 is more accurate than the p-value of 0? The K-S test will show significance for very small differences in distributions when the sample size is big enough.
Also, it is not clear that you are using it correctly. Generally you would just give the raw data and the CDF to the function, don't worry about midpoints.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Jochen1980
Sent: Friday, July 29, 2011 3:08 AM
To: r-help at r-project.org
Subject: [R] Problems with ks.test()
Hi,
I got two data point vectors. Now I want to make a ks.test(). I you print
both vectors you will see, that they fit pretty fine. Here is a picture:
http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png
As you can see there is one histogram and moreover there is the gumbel
density
function plotted. Now I took to bin-mids and the bin-height for vector1 and
computed the distribution-values to all bin-mids as vector2.
I pass these two vectors to ks.test(). Are those the right vectors, if I
want
to decide afterwards, if my experiment-data is gumbel-distributed?
Surprisingly the p-value changes tremendously if I calculate more digits out
of
my theoretical formula. If I round to 0 digits, p is 1, if I round to 4
digits,
p drops to 0 - how could this happen, I thought more digits will bring more
accurate results?!
XXXX Case 0 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXX
[1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16
7
[19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[91] 0 0 0 0 0 0 0 0 0 0
[1] 0 0 0 0 1 10 49 113 160 168 147 113 81 55 37 24 15
10
[19] 6 4 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0
0
[37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[91] 0 0 0 0 0 0 0 0 0 0
[1] "Ergebnisse"
[1] "Analyse der Eingangsdaten"
[1] "Mean: 0.104537195"
[1] "SAbw.: 0.0277657985898433"
[1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung"
[1] "Mue: 0.0920411082987717"
[1] "Beta: 0.0216489043196013"
[1] "KS-Test -> 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 =
Histogrammhöhen"
[1] "KST D: 0.04"
[1] "KST P: 1"
XXX Case 4 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16
7
[19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
[91] 0 0 0 0 0 0 0 0 0 0
[1] 0.000 0.000 0.000 0.006 0.622 10.094 49.271 112.776
160.174
[10] 168.419 146.527 113.137 81.026 55.344 36.690 23.870 15.347
9.793
[19] 6.220 3.939 2.490 1.572 0.992 0.625 0.394 0.248
0.157
[28] 0.099 0.062 0.039 0.025 0.016 0.010 0.006 0.004
0.002
[37] 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000
0.000
[46] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000
[55] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000
[64] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000
[73] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000
[82] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000
[91] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000
[100] 0.000
[1] "Ergebnisse"
[1] "Analyse der Eingangsdaten"
[1] "Mean: 0.104537195"
[1] "SAbw.: 0.0277657985898433"
[1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung"
[1] "Mue: 0.0920411082987717"
[1] "Beta: 0.0216489043196013"
[1] "KS-Test -> 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 =
Histogrammhöhen"
[1] "KST D: 0.2"
[1] "KST P: 0.0366"
Thanks in advance for some help.
Jochen
--
View this message in context: http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list