[R] Difference Between R: wilcox.test and STATA: signrank
David Winsemius
dwinsemius at comcast.net
Mon Aug 9 16:26:35 CEST 2010
On Aug 9, 2010, at 9:52 AM, peter dalgaard wrote:
>
> On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote:
>
>> Hi,
>>
>> Look at the output of the test made in R and you can see it is a
>> Wilcoxon rank sum test and not a Wilcoxon signed rank test.
>
> It might be helpful to add that paired=TRUE is needed in the call to
> get the signed-rank test.
>
>> If there are ties, I know I prefer wilcox.exact from the
>> exactRankTests.
>>
>
> (Not that much of an issue in larger sample sizes, I'd say. Even
> with binary data, the normal approximation works reasonably well
> under the usual assumptions of expected counts > 5, since the tie-
> adjustment for the variance is exact for the distribution of the
> ranks. The continuity correction doesn't quite work though. Anyways,
> wilcox.exact is of course a nice thing to have.)
The OP's data:
> table(xvals=dat$x, yvals=dat$y)
yvals
xvals 0 0.25 0.5 1 1.1 1.5 2 3 3.5 5 5.5 6 8
0 35 0 0 1 0 1 2 1 0 0 0 0 0
0.5 2 1 1 0 0 0 0 0 0 0 0 0 0
0.75 0 0 1 0 0 0 0 0 0 0 0 0 0
1 7 0 1 3 0 0 1 0 1 0 0 0 0
1.1 0 0 0 0 1 0 0 0 0 0 0 0 0
1.5 1 1 0 4 0 2 0 0 0 0 0 0 0
2 3 0 0 6 0 2 4 2 1 0 0 0 0
2.1 0 0 1 0 0 0 0 0 0 0 0 0 0
2.5 0 0 0 0 0 1 0 0 0 2 0 0 0
3 2 0 0 0 0 0 5 3 1 1 0 0 0
3.3 1 0 0 0 0 0 0 0 0 0 0 0 0
3.33 0 0 0 1 0 0 0 0 0 0 0 0 0
3.5 0 0 0 1 0 1 0 0 0 1 1 0 0
5 0 0 0 0 0 0 0 0 0 2 0 1 1
10 0 0 0 0 0 0 0 0 0 1 0 0 0
Adding paired=TRUE to the wilcox.test call give the signed rank test
although tht is not likely to satisfy the OP since she seems to be
expecting a higher degree of congruence with Stata.
The wilcox.test and wilcox.exact give results that only differ at the
4th decimal place.
> wilcox.test(dat$x, dat$y, paired=TRUE)
Wilcoxon signed rank test with continuity correction
data: dat$x and dat$y
V = 1181, p-value = 0.08872
alternative hypothesis: true location shift is not equal to 0
> wilcox.exact(dat$x, dat$y, paired=TRUE)
Asymptotic Wilcoxon signed rank test
data: dat$x and dat$y
V = 1181, p-value = 0.08805
alternative hypothesis: true mu is not equal to 0
The Stata output indicates some sort of adjustment for zeros. The
wilcox.test basically throws out the zeros (presumably the zero
differences), so there may be a difference in the algorithm. Her data
has 51 zero differences and 61 non-zero differences.
> sum(dat$x==dat$y)
[1] 51
> sum(dat$x!=dat$y)
[1] 61
Wait a minute; the Stata report said she had 49 zeros and only 108
records.
Different data. Different results. I suppose it could be my editing
errors. Taking out all the extraneous html junk and restoring missing
delimiters was kind of a pain.
Capasia; Don't use Google sheets to transmit data. Instead use dput
on the datablatt object and just post the results of that output.
--
David.
>
>
>> Alain
>>
>> On 09-Aug-10 12:43, Capasia wrote:
>>> This is my first post to the mailing list and I guess it's a
>>> pretty stupid
>>> question but I can't figure it out. I hope this is the right forum
>>> for these
>>> kind of questions.
>>>
>>> Before I started using R I was using STATA to run a Wilcoxon
>>> signed-rank
>>> test on two variables. See data below:
>>>
>>> https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkE&hl=en&output=html
>>> <%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkE&hl=en&output=html
>>> >
>>>
>>> STATA Output:
>>> . signrank x=y
>>>
>>> Wilcoxon signed-rank test
>>>
>>> sign | obs sum ranks expected
>>> -------------+---------------------------------
>>> positive | 41 3101 2330.5
>>> negative | 18 1560 2330.5
>>> zero | 49 1225 1225
>>> -------------+---------------------------------
>>> all | 108 5886 5886
>>>
>>> unadjusted variance 106438.50
>>> adjustment for ties -282.38
>>> adjustment for zeros -10106.25
>>> ----------
>>> adjusted variance 96049.88
>>>
>>> Ho: transfer_2_a = transfer_2_b
>>> z = 2.486
>>> Prob> |z| = *0.0129*
>>>
>>> When running a Wilcoxon signed-rank test
>>>
>>>
>>>> wilcox.test(datablatt$x, datablatt$y)
>>> Wilcoxon rank sum test with continuity correction
>>>
>>> data: datablatt$x and datablatt$y
>>> W = 7059.5, p-value = *0.09197*
>>> alternative hypothesis: true location shift is not equal to 0
>>>
>>> As you can see the p Values are different (one with H0 rejection
>>> and the
>>> other one not). I tested whether it could be that the STATA one
>>> isn't paired
>>> but this doesn't seem to be the problem.
>>>
>>> I'm dumbfound what could lead to such a difference. I couldn't
>>> find any
>>> seetings I have missed but I somehow I guess I'm using the
>>> function in the
>>> wrong way...
>>> Any ideas?
>>> Thanks a lot in advance!
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Alain Guillet
>> Statistician and Computer Scientist
>>
>> SMCS - IMMAQ - Université catholique de Louvain
>> Bureau c.316
>> Voie du Roman Pays, 20
>> B-1348 Louvain-la-Neuve
>> Belgium
>>
>> tel: +32 10 47 30 50
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list