[R] Wilcoxon Test and Mean Ratios

Thu Sep 20 20:09:53 CEST 2012

On Sep 20, 2012, at 02:43 , Thomas Lumley wrote:

> On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
> <aradwen at gmail.com> wrote:
>> Hello All,
>> 
>> I am writing to ask your opinion on how to interpret this case. I have two vectors "a" and "b" that I am trying to compare.
>> 
>> The wilcoxon test is giving me a pvalue of 5.139217e-303 of a over b with the alternative "greater". Now if I make a summary on each of them I have the following
>> 
>>> summary(a)
>>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>> 0.0000000 0.0001411 0.0002381 0.0002671 0.0003623 0.0012910
>>> summary(c)
>>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>> 0.0000000 0.0000000 0.0000000 0.0004947 0.0002972 1.0000000
>> 
>> The mean ratio is then around 0.5399031 which naively goes in opposite direction of the wilcoxon test ( I was expecting to find a ratio >> 1)
>> 
> 
> There's nothing conceptually strange about the Wilcoxon test showing a
> difference in the opposite direction to the difference in means.  It's
> probably easiest to think about this in terms of the Mann-Whitney
> version of the same test, which is based on the proportion of pairs of
> one observation from each group where the `a' observation is higher.
> Your 'c' vector has a lot more zeros, so a randomly chosen observation
> from 'c' is likely to be smaller than one from 'a', but the non-zero
> observations seem to be larger, so the mean of 'c' is higher.
> 
> The Wilcoxon test probably isn't very useful in a setting like this,
> since its results really make sense only under 'stochastic ordering',
> where the shift is in the same direction across the whole
> distribution.
> 
>  -thomas

I was sure I had seen a definition where X was "larger than" Y if P(X>Y) > P(Y<X), but that's obviously not the normal definition. Anyways, it is worth emphasizing that that is what the Wilcoxon test tests for, not whether the means differ, nor whether the medians do. As a counterexample of the latter, try

x <- rep(0:1, c(60,40))
y <- rep(0:1, c(80,20))
wilcox.test(x,y)
median(x)
median(y)

(and the "location shift" reference in wilcox.test output is a bit of a red herring.)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com