[R] help comparing two median with R

Tue Apr 17 17:04:41 CEST 2007

Thomas Lumley wrote:
> On Tue, 17 Apr 2007, Robert McFadden wrote:
> 
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jim Lemon
>>> Sent: Tuesday, April 17, 2007 12:37 PM
>>> To: Pedro A Reche
>>> Cc: r-help at stat.math.ethz.ch
>>> Subject: Re: [R] help comparing two median with R
>>>
>>> Pedro A Reche wrote:
>>>> Dear R users,
>>>> I am new to R and  I would like to ask your help with the following
>>>> topic. I have three sets of numeral data, 2 sets are paired and a
>>>> third is independent of the other two. For each of these sets I have
>>>> obtained their basic statistics (mean, median, stdv, range ...).
>>>> Now I want to compare if these sets differ. I could compare
>>> the mean
>>>> doing a basic T test . However, I was looking for a test to compare
>>>> the medians using R.   If that is possible I would love to
>>> hear the
>>>> specifics.
>>> Hi Pedro,
>>> You can use the Mann-Whitney test ("wilcox" with two
>>> samples), but you would have to check that the second and
>>> third moments of the variable distributions were the same, I think.
>>>
>>> Jim
>> Use Mann-Whitney U test, but remember about 2 assumption:
>> 1. samples come from continuous distribution (there are no tied
>> obserwations)
>> 2. distributions are identical in shape. It's very similar to t-test but
>> Mann-Whitney U test is not as affected by violation of the homogeneity of
>> variance assumption as t-test is.
>>
> 
> This turns out not to be quite correct.
> 
> If the two distributions differ only by a location shift then the 
> hypothesis that the shift is zero is equivalent to the medians being the 
> same (or the means, or the 3.14159th percentile), and the Mann-Whitney U 
> test will test this hypothesis. Otherwise the Mann-Whitney U test does not 
> test for equal medians.
> 
> The assumption that the distributions are continuous is for convenience -- 
> it makes the distribution of the test statistic easier to calculate and 
> otherwise R uses a approximation.  The assumption of a location shift is 
> critical -- otherwise it is easy to construct three data sets x,y,z so 
> that the Mann-Whitney U test thinks x is larger than y, y is larger than z 
> and z is larger than x (Google for Efron Dice). That is, the Mann-Whitney 
> U test cannot be a test for any location statistic.
> 
> There actually is an exact test for the median that does not assume a 
> location shift:  dichotomize your data at the pooled median to get a 2x2 
> table of above/below median by group, and do Fisher's exact test on the 
> table.  This is almost never useful (because it doesn't come with an 
> interval estimate), but is interesting because it (and the generalizations 
> to other quantiles) is the only exactly distribution-free location test 
> that does not have the 'non-transitivity' problem of the Mann-Whitney U 
> test.  I believe this median test is attributed to Mood, but I have not 
> seen the primary source.
> 
>  	-thomas

The Mood test is so inefficient that its use is no longer recommended:

@Article{fri00sho,
   author =               {Freidlin, Boris and Gastwirth, Joseph L.},
   title =                {Should the median test be retired from 
general use?},
   journal =      American Statistician,
   year =                 2000,
   volume =               54,
   number =               3,
   pages =                {161-164},
   annote =               {inefficiency of Mood median test}
}

The points that Thomas and Brian have made are certainly correct, if one 
is truly interested in testing for differences in medians or means.  But 
the Wilcoxon test provides a valid test of x > y more generally.  The 
test is consonant with the Hodges-Lehmann estimator: the median of all 
possible differences between an X and a Y.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University