[R] Unexpected behavior in friedman.test and ks.test

Tue Sep 8 17:26:52 CEST 2009

Alex,

It's mainly speculation, as I cannot check the Excel add-in nor Vassar, but I'll give it a try.

For the Friedman-test: Results of R coincide with those reported by Hollander & Wolfe, which I'd take as a point in favor of R. In any case, my guess is that ties are handled differently (average ranks in R), but you'd have to check with the documentation of WinSTAT and Vassar. If it is not documented, see what test statistic you'd get "manually" according to which handling of ties.

For the ks.test: See the ?ks.test for meaning of the "exact" argument of this function. I'd assume that Excel gives you the asymptotic p value only, while R will by default return an exact one for 32 samples. From the same help page: "Otherwise, asymptotic distributions are used whose approximations may be inaccurate in small samples". You could check using something like ks.test(myData$f1_A, pnorm, exact=FALSE). If that doesn't resolve the issue: do the KS test (semi-)manually, which should not be that difficult (even in Excel, if the need may be), and compare the D value with the one obtained from R and Excel, respectively.

HTH, Michael

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of 
> atconsta-rhelp at yahoo.com
> Sent: Dienstag, 8. September 2009 15:33
> To: R-help at r-project.org
> Subject: [R] Unexpected behavior in friedman.test and ks.test
> 
> I have to start by saying that I am new to R, so I might miss 
> something crucial here. It seems to me that the results of 
> friedman.test and ks.test are "wrong". Now, obviously, the 
> first thing which crossed my mind was "it can't be, this is a 
> package used by so many, someone should have observed", but I 
> can't figure out what it might be.
> 
> Problem: let's start with friedman.test. I have a lot of data 
> to analyze and I grew sick and tired of clicking and 
> selecting in Excel (for which we have a statistics Add-In 
> purchased, don't' start to flame me on using Excel for stats, 
> please!); so I wanted to automate the analysis in R and 
> figured out the results differ from Excel. Example Take the 
> data from example(friedman.test)  (Hollander & Wolfe (1973), 
> p. 140ff.). I ran the example in R and got:
> 
>         Friedman rank sum test
> data:  RoundingTimes
> Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805
> 
> Same data, in Excel, using the WinSTAT for Excel (Fitch 
> software), gives: Friedman chi-squared = 10.6364, df = 2, 
> p-value =0.004902
> 
> Puzzled, I entered the data in the calculator from Vassar 
> (http://faculty.vassar.edu/lowry/fried3.html ) and got 
> exactly the same values as in Excel (and, again, different 
> from R). Admittedly, the differences are not large, and both 
> fall below the 0.05 threshold, but, still.
> 
> So, question 1 would be "why is R different from both Excel 
> and Vassar?"
> 
> 
> Now to the Kolmogorov-Smirnov test, from which my odeal 
> actually started: the results from ks.test are wildly 
> different from the ones I have got with the Excel add-in. 
> Basically, I have 32 sets of observations (patients) for 100 
> independent variables (different blood analyses). Question 
> was whether the data is normally distributed for each of the 
> analyses and, hence, whether I can apply a parametric test or not.
> Once I had loaded the data in a dataframe (and it looks as 
> expected), I ran:
> ks.test(myData$f1_A, pnorm)
> ks.test(myData$f8_A, pnorm)
> 
> They give p-values of < 2.2e-16 (with ties) and 8.882e-16. 
> The Excel Add-In gives p-values of 
>  
> 0.0074491 and, respectively, 0.2730477
> 
> Here the difference is serious, like between highly 
> significant non-normal for both f1 and f8 (R), or one 
> non-normal and one normal (the Add-in). I first thought that 
> the difference might arise from different probablity 
> distributions (but what else, if not pnorm). Then I ran the 
> friedman test, to find out similar discrepancies.
> 
> I'd really appreciate some input on this: what's wrong and 
> how should I decide whom to trust?
> 
> Many thanks in advance,
> 
> Alex
> 
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>