[R] help comparing two median with R
Thomas Lumley
tlumley at u.washington.edu
Tue Apr 17 16:48:07 CEST 2007
On Tue, 17 Apr 2007, Robert McFadden wrote:
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jim Lemon
>> Sent: Tuesday, April 17, 2007 12:37 PM
>> To: Pedro A Reche
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] help comparing two median with R
>>
>> Pedro A Reche wrote:
>>> Dear R users,
>>> I am new to R and I would like to ask your help with the following
>>> topic. I have three sets of numeral data, 2 sets are paired and a
>>> third is independent of the other two. For each of these sets I have
>>> obtained their basic statistics (mean, median, stdv, range ...).
>>> Now I want to compare if these sets differ. I could compare
>> the mean
>>> doing a basic T test . However, I was looking for a test to compare
>>> the medians using R. If that is possible I would love to
>> hear the
>>> specifics.
>>
>> Hi Pedro,
>> You can use the Mann-Whitney test ("wilcox" with two
>> samples), but you would have to check that the second and
>> third moments of the variable distributions were the same, I think.
>>
>> Jim
> Use Mann-Whitney U test, but remember about 2 assumption:
> 1. samples come from continuous distribution (there are no tied
> obserwations)
> 2. distributions are identical in shape. It's very similar to t-test but
> Mann-Whitney U test is not as affected by violation of the homogeneity of
> variance assumption as t-test is.
>
This turns out not to be quite correct.
If the two distributions differ only by a location shift then the
hypothesis that the shift is zero is equivalent to the medians being the
same (or the means, or the 3.14159th percentile), and the Mann-Whitney U
test will test this hypothesis. Otherwise the Mann-Whitney U test does not
test for equal medians.
The assumption that the distributions are continuous is for convenience --
it makes the distribution of the test statistic easier to calculate and
otherwise R uses a approximation. The assumption of a location shift is
critical -- otherwise it is easy to construct three data sets x,y,z so
that the Mann-Whitney U test thinks x is larger than y, y is larger than z
and z is larger than x (Google for Efron Dice). That is, the Mann-Whitney
U test cannot be a test for any location statistic.
There actually is an exact test for the median that does not assume a
location shift: dichotomize your data at the pooled median to get a 2x2
table of above/below median by group, and do Fisher's exact test on the
table. This is almost never useful (because it doesn't come with an
interval estimate), but is interesting because it (and the generalizations
to other quantiles) is the only exactly distribution-free location test
that does not have the 'non-transitivity' problem of the Mann-Whitney U
test. I believe this median test is attributed to Mood, but I have not
seen the primary source.
-thomas
More information about the R-help
mailing list