[R] Unexpected behavior in friedman.test and ks.test
atconsta-rhelp at yahoo.com
atconsta-rhelp at yahoo.com
Tue Sep 8 15:33:25 CEST 2009
I have to start by saying that I am new to R, so I might miss something crucial here. It seems to me that the results of friedman.test and ks.test are "wrong". Now, obviously, the first thing which crossed my mind was "it can't be, this is a package used by so many, someone should have observed", but I can't figure out what it might be.
Problem: let's start with friedman.test. I have a lot of data to analyze and I grew sick and tired of clicking and selecting in Excel (for which we have a statistics Add-In purchased, don't' start to flame me on using Excel for stats, please!); so I wanted to automate the analysis in R and figured out the results differ from Excel. Example
Take the data from example(friedman.test) (Hollander & Wolfe (1973), p. 140ff.). I ran the example in R and got:
Friedman rank sum test
data: RoundingTimes
Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805
Same data, in Excel, using the WinSTAT for Excel (Fitch software), gives: Friedman chi-squared = 10.6364, df = 2, p-value =0.004902
Puzzled, I entered the data in the calculator from Vassar (http://faculty.vassar.edu/lowry/fried3.html ) and got exactly the same values as in Excel (and, again, different from R). Admittedly, the differences are not large, and both fall below the 0.05 threshold, but, still.
So, question 1 would be "why is R different from both Excel and Vassar?"
Now to the Kolmogorov–Smirnov test, from which my odeal actually started: the results from ks.test are wildly different from the ones I have got with the Excel add-in. Basically, I have 32 sets of observations (patients) for 100 independent variables (different blood analyses). Question was whether the data is normally distributed for each of the analyses and, hence, whether I can apply a parametric test or not.
Once I had loaded the data in a dataframe (and it looks as expected), I ran:
ks.test(myData$f1_A, pnorm)
ks.test(myData$f8_A, pnorm)
They give p-values of < 2.2e-16 (with ties) and 8.882e-16. The Excel Add-In gives p-values of
0.0074491 and, respectively, 0.2730477
Here the difference is serious, like between highly significant non-normal for both f1 and f8 (R), or one non-normal and one normal (the Add-in). I first thought that the difference might arise from different probablity distributions (but what else, if not pnorm). Then I ran the friedman test, to find out similar discrepancies.
I'd really appreciate some input on this: what's wrong and how should I decide whom to trust?
Many thanks in advance,
Alex
More information about the R-help
mailing list