[R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test

Wed May 4 14:25:35 CEST 2011

On May 4, 2011, at 11:03 , JP wrote:

> On 3 May 2011 20:50, peter dalgaard <pdalgd at gmail.com> wrote:
>> 
>> On Apr 28, 2011, at 15:18 , JP wrote:
>> 
>>> 
>>> 
>>> I have found that when doing a wilcoxon signed ranked test you should report:
>>> 
>>> - The median value (and not the mean or sd, presumably because of the
>>> underlying potential non normal distribution)
>>> - The Z score (or value)
>>> - r
>>> - p value
>>> 
>> 
>> ...printed on 40g/m^2 acid free paper with a pencil of 3B softness?
>> 
>> Seriously, with nonparametrics, the p value is the only thing of real interest, the other stuff is just attempting to check on authors doing their calculations properly. The median difference is of some interest, but it is not actually what is being tested, and in heavily tied data, it could even be zero with a highly significant p-value. The Z score can in principle be extracted from the p value (qnorm(p/2), basically) but it's obviously unstable in the extreme cases. What is r? The correlation? Pearson, not Spearman?
>> 
> 
> Thanks for this Peter - a couple of more questions:
> 
> a <- rnorm(500)
> b <- runif(500, min=0, max=1)
> x <- wilcox.test(a, b, alternative="two.sided", exact=T, paired=T)
> x$statistic
> 
>    V
> 31835
> 
> What is V? (is that the value Z of the test statistic)?

No. It's the sum of the positive ranks:

        r <- rank(abs(x))
        STATISTIC <- sum(r[x > 0])
        names(STATISTIC) <- "V"

(where x is actually x-y in the paired case)

Subtract the expected value of V (sum(1:500)/2 == 62625) in your case, and divide by the standard deviation (sqrt(500*501*1001/24)=3232.327) and you get Z=-9.54. The slight discrepancy is likely due to your use of exact=T (so your p value is not actually computed from Z).

> 
> z.score <- qnorm(x$p.value/2)
> [1] -9.805352
> 
> But what does this zscore show in practice?

That your test statistic is approx. 10 standard deviations away from its mean, if the null hypothesis were to be true.

> 
> The d.f. are suggested to be reported here:
> http://staff.bath.ac.uk/pssiw/stats2/page2/page3/page3.html
> 

Some software replaces the asymptotic normal distribution of the rank sums with the t-distribution with the same df as would be used in an ordinary t test. However, since there is no such thing as an independent variance estimate in the Wilcoxon test, it is hard to see how that should be an improvement. I have it down to "coding by non-statistician".

> And r is mentioned here
> http://huberb.people.cofc.edu/Guide/Reporting_Statistics%20in%20Psychology.pdfs
> 
> 

Aha, so it's supposed to be the effect size. On the referenced site they suggest to use r=Z/sqrt(N). (They even do so for the independent samples version, which looks wrong to me). 

> 
>>> My questions are:
>>> 
>>> - Are the above enough/correct values to report (some places even
>>> quote W and df) ?
>> 
>> df is silly, and/or blatantly wrong...
>> 
>>>  What else would you suggest?
>>> - How do I calculate the Z score and r for the above example?
>>> - How do I get each statistic from the pairwise.wilcox.test call?
>>> 
>>> Many Thanks
>>> JP
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> --
>> Peter Dalgaard
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>> 
>> 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com