[R] effect sizes for Wilcoxon tests

Wed Nov 16 13:12:35 CET 2005

torsten at hothorn.de writes:

> On Wed, 16 Nov 2005, Peter Dalgaard wrote:
> 
> > Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
> > [snip]
> >
> > > > > However, how do I get Z from a Wilcoxon test in R?
> > > >
> > > > wtest <- wilcox.test(y~group,data=d, alternative="greater")
> > > > qnorm(wtest$p.value)
> > > >
> > >
> > > or
> > >
> > > library("coin")
> > > statistic(wilcox_test(y ~ group, data = d, ...), type = "standardized")
> > >
> > > where the variance `estimator' takes care of tied observations.
> >
> > Doesn't it do that in the same way as inside wilcox.test(...,exact=FALSE)?
> >
> 
> My understanding was that `wilcox.test' implements the unconditional version
> (with unconditional variance estimator and some `adjustment' for ties) and
> `wilcox_test' implements the conditional version of the test (of course both
> coincide when there are no ties).
> 
> However, some quick experiments suggest that the standardized statistic is
> the same for both versions (with correct = FALSE) for tied observations.
> One needs to check if the expectation and variance formulae in
> `wilcox.test' are equivalent with the conditional versions used in
> `wilcox_test' (in contrast to my initial opinion).

I think you'll find that they are the same. There isn't really an
unconditional variance formula in the presence of ties - I don't think
you can do that without knowing what the point masses are in the
underlying distribution. The question is only whether the tie
corrected statistic is an asymptotic approximation or an exact formula
for the variance. I believe it is the latter.

What you need to calculate is the expectation and variance of the
(possibly tied) rank of a particular observation, given the sets of
tied observations. In principle, also the covariance between two of
them, but this is easily seen to be equal to -1/(N-1) times the
variance since they are all equal and the rows/columns of the
covariance sums to zero.

The expectation is a no-brainer: tie-breaking preserves the sum of
ranks so the average rank is left unchanged by ties. 

The fun bit is trying to come up with an elegant argument why the
correction term for the variances, involving sum(NTIES.CI^3 -
NTIES.CI) is exact. I think you can do it by saying that breaking a
set of tied ranks randomly corresponds to adding a term which has a
variance related to that of a random number between 1 and d, with
probability d/N . Notice that sum((1:d)^2) - sum(1:d)^2 is (d^3-d)/3.
After breaking the ties at random, you should end up with the untied
situation, so you get the tied variance by subtracting the variance of
the tie-breaking terms.

Tying up the loose ends is left as an exercise....

> Best,
> 
> Torsten
> 
> > Just wondering.
> >
> >         -p
> >
> > --
> >    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
> >   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907
> >
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907