# [R] help comparing two median with R

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Apr 17 17:04:41 CEST 2007

```Thomas Lumley wrote:
> On Tue, 17 Apr 2007, Robert McFadden wrote:
>
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jim Lemon
>>> Sent: Tuesday, April 17, 2007 12:37 PM
>>> To: Pedro A Reche
>>> Cc: r-help at stat.math.ethz.ch
>>> Subject: Re: [R] help comparing two median with R
>>>
>>> Pedro A Reche wrote:
>>>> Dear R users,
>>>> I am new to R and  I would like to ask your help with the following
>>>> topic. I have three sets of numeral data, 2 sets are paired and a
>>>> third is independent of the other two. For each of these sets I have
>>>> obtained their basic statistics (mean, median, stdv, range ...).
>>>> Now I want to compare if these sets differ. I could compare
>>> the mean
>>>> doing a basic T test . However, I was looking for a test to compare
>>>> the medians using R.   If that is possible I would love to
>>> hear the
>>>> specifics.
>>> Hi Pedro,
>>> You can use the Mann-Whitney test ("wilcox" with two
>>> samples), but you would have to check that the second and
>>> third moments of the variable distributions were the same, I think.
>>>
>>> Jim
>> Use Mann-Whitney U test, but remember about 2 assumption:
>> 1. samples come from continuous distribution (there are no tied
>> obserwations)
>> 2. distributions are identical in shape. It's very similar to t-test but
>> Mann-Whitney U test is not as affected by violation of the homogeneity of
>> variance assumption as t-test is.
>>
>
> This turns out not to be quite correct.
>
> If the two distributions differ only by a location shift then the
> hypothesis that the shift is zero is equivalent to the medians being the
> same (or the means, or the 3.14159th percentile), and the Mann-Whitney U
> test will test this hypothesis. Otherwise the Mann-Whitney U test does not
> test for equal medians.
>
> The assumption that the distributions are continuous is for convenience --
> it makes the distribution of the test statistic easier to calculate and
> otherwise R uses a approximation.  The assumption of a location shift is
> critical -- otherwise it is easy to construct three data sets x,y,z so
> that the Mann-Whitney U test thinks x is larger than y, y is larger than z
> and z is larger than x (Google for Efron Dice). That is, the Mann-Whitney
> U test cannot be a test for any location statistic.
>
> There actually is an exact test for the median that does not assume a
> location shift:  dichotomize your data at the pooled median to get a 2x2
> table of above/below median by group, and do Fisher's exact test on the
> table.  This is almost never useful (because it doesn't come with an
> interval estimate), but is interesting because it (and the generalizations
> to other quantiles) is the only exactly distribution-free location test
> that does not have the 'non-transitivity' problem of the Mann-Whitney U
> test.  I believe this median test is attributed to Mood, but I have not
> seen the primary source.
>
>  	-thomas

The Mood test is so inefficient that its use is no longer recommended:

@Article{fri00sho,
author =               {Freidlin, Boris and Gastwirth, Joseph L.},
title =                {Should the median test be retired from
general use?},
journal =      American Statistician,
year =                 2000,
volume =               54,
number =               3,
pages =                {161-164},
annote =               {inefficiency of Mood median test}
}

The points that Thomas and Brian have made are certainly correct, if one
is truly interested in testing for differences in medians or means.  But
the Wilcoxon test provides a valid test of x > y more generally.  The
test is consonant with the Hodges-Lehmann estimator: the median of all
possible differences between an X and a Y.

Frank

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
Department of Biostatistics   Vanderbilt University

```