[R] Difference Between R: wilcox.test and STATA: signrank

Mon Aug 9 16:26:35 CEST 2010

On Aug 9, 2010, at 9:52 AM, peter dalgaard wrote:

>
> On Aug 9, 2010, at 3:03 PM, Alain Guillet wrote:
>
>> Hi,
>>
>> Look at the output of the test made in R and you can see it is a  
>> Wilcoxon rank sum test and not a Wilcoxon signed rank test.
>
> It might be helpful to add that paired=TRUE is needed in the call to  
> get the signed-rank test.
>
>> If there are ties, I know I prefer wilcox.exact from the  
>> exactRankTests.
>>
>
> (Not that much of an issue in larger sample sizes, I'd say. Even  
> with binary data, the normal approximation works reasonably well  
> under the usual assumptions of expected counts > 5, since the tie- 
> adjustment for the variance is exact for the distribution of the  
> ranks. The continuity correction doesn't quite work though. Anyways,  
> wilcox.exact is of course a nice thing to have.)

The OP's data:

 > table(xvals=dat$x, yvals=dat$y)
       yvals
xvals   0 0.25 0.5  1 1.1 1.5  2  3 3.5  5 5.5  6  8
   0    35    0   0  1   0   1  2  1   0  0   0  0  0
   0.5   2    1   1  0   0   0  0  0   0  0   0  0  0
   0.75  0    0   1  0   0   0  0  0   0  0   0  0  0
   1     7    0   1  3   0   0  1  0   1  0   0  0  0
   1.1   0    0   0  0   1   0  0  0   0  0   0  0  0
   1.5   1    1   0  4   0   2  0  0   0  0   0  0  0
   2     3    0   0  6   0   2  4  2   1  0   0  0  0
   2.1   0    0   1  0   0   0  0  0   0  0   0  0  0
   2.5   0    0   0  0   0   1  0  0   0  2   0  0  0
   3     2    0   0  0   0   0  5  3   1  1   0  0  0
   3.3   1    0   0  0   0   0  0  0   0  0   0  0  0
   3.33  0    0   0  1   0   0  0  0   0  0   0  0  0
   3.5   0    0   0  1   0   1  0  0   0  1   1  0  0
   5     0    0   0  0   0   0  0  0   0  2   0  1  1
   10    0    0   0  0   0   0  0  0   0  1   0  0  0

Adding paired=TRUE to the wilcox.test call give the signed rank test  
although tht is not likely to satisfy the OP since she seems to be  
expecting a higher degree of congruence with Stata.

The wilcox.test and wilcox.exact give results that only differ at the  
4th decimal place.

 > wilcox.test(dat$x, dat$y, paired=TRUE)

	Wilcoxon signed rank test with continuity correction

data:  dat$x and dat$y
V = 1181, p-value = 0.08872
alternative hypothesis: true location shift is not equal to 0

 > wilcox.exact(dat$x, dat$y, paired=TRUE)

	Asymptotic Wilcoxon signed rank test

data:  dat$x and dat$y
V = 1181, p-value = 0.08805
alternative hypothesis: true mu is not equal to 0

The Stata output indicates some sort of adjustment for zeros. The  
wilcox.test basically throws out the zeros (presumably the zero  
differences), so there may be a difference in the algorithm. Her data  
has 51 zero differences and 61 non-zero differences.

 > sum(dat$x==dat$y)
[1] 51
 > sum(dat$x!=dat$y)
[1] 61

Wait a minute; the Stata report said she had 49 zeros and only 108  
records.

Different data. Different results. I suppose it could be my editing  
errors. Taking out all the extraneous html junk and restoring missing  
delimiters was kind of a pain.

Capasia;  Don't use Google sheets to transmit data. Instead use dput  
on the datablatt object and just post the results of that output.

-- 
David.
>
>
>> Alain
>>
>> On 09-Aug-10 12:43, Capasia wrote:
>>> This is my first post to the mailing list and I guess it's a  
>>> pretty stupid
>>> question but I can't figure it out. I hope this is the right forum  
>>> for these
>>> kind of questions.
>>>
>>> Before I started using R I was using STATA to run a Wilcoxon  
>>> signed-rank
>>> test on two variables. See data below:
>>>
>>> https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkE&hl=en&output=html 
>>> <%20%20https://spreadsheets.google.com/pub?key=0ApodAA2GAEP_dDZkdzZHSFBqX1JHOWJBX1dMQUZCVkE&hl=en&output=html 
>>> >
>>>
>>> STATA Output:
>>> . signrank x=y
>>>
>>> Wilcoxon signed-rank test
>>>
>>>       sign |      obs   sum ranks    expected
>>> -------------+---------------------------------
>>>   positive |       41        3101      2330.5
>>>   negative |       18        1560      2330.5
>>>       zero |       49        1225        1225
>>> -------------+---------------------------------
>>>        all |      108        5886        5886
>>>
>>> unadjusted variance   106438.50
>>> adjustment for ties     -282.38
>>> adjustment for zeros  -10106.25
>>>                    ----------
>>> adjusted variance      96049.88
>>>
>>> Ho: transfer_2_a = transfer_2_b
>>>            z =   2.486
>>>   Prob>  |z| =   *0.0129*
>>>
>>> When running a Wilcoxon signed-rank test
>>>
>>>
>>>> wilcox.test(datablatt$x, datablatt$y)
>>> Wilcoxon rank sum test with continuity correction
>>>
>>> data:  datablatt$x and datablatt$y
>>> W = 7059.5, p-value = *0.09197*
>>> alternative hypothesis: true location shift is not equal to 0
>>>
>>> As you can see the p Values are different (one with H0 rejection  
>>> and the
>>> other one not). I tested whether it could be that the STATA one  
>>> isn't paired
>>> but this doesn't seem to be the problem.
>>>
>>> I'm dumbfound what could lead to such a difference. I couldn't  
>>> find any
>>> seetings I have missed but I somehow I guess I'm using the  
>>> function in the
>>> wrong way...
>>> Any ideas?
>>> Thanks a lot in advance!
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> -- 
>> Alain Guillet
>> Statistician and Computer Scientist
>>
>> SMCS - IMMAQ - Université catholique de Louvain
>> Bureau c.316
>> Voie du Roman Pays, 20
>> B-1348 Louvain-la-Neuve
>> Belgium
>>
>> tel: +32 10 47 30 50
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT