# [R] help comparing two median with R

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Apr 18 18:48:01 CEST 2007

```Cody_Hamilton at Edwards.com wrote:
> Has anyone proposed using a bootstrap for Pedro's problem?
>
> What about taking a boostrap sample from x, a boostrap sample from y, take
> the difference in the medians for these two bootstrap samples, repeat the
> process 1,000 times and calculate the 95th percentiles of the 1,000
> computed differences?  You would get a CI on the difference between the
> medians for these two groups, with which you could determine whether the
> difference was greater/less than zero.  Too crude?
>
> Regards,
>    -Cody

As hinted at by Brian Ripley, the following code will approximate that.
It gets the nonparametric confidence interval for the median and
solves for the variance that would give the same confidence interval
width if normality of the median held.

g <- function(y) {
y <- sort(y[!is.na(y)])
n <- length(y)
if(n < 4) return(c(median=median(y),q1=NA,q3=NA,variance=NA))
qu <- quantile(y, c(.5,.25,.75))
names(qu) <- NULL
r <- pmin(qbinom(c(.025,.975), n, .5) + 1, n)  ## Exact 0.95 C.L.
w <- y[r] - y[r]                         ## Width of C.L.
var.med <- ((w/1.96)^2)/4      ## Approximate variance of median
c(median=qu, q1=qu, q3=qu, variance=var.med)
}

Run g separately by group, add the two variances, and take the square
root to approximate the variance of the difference in medians and get a
confidence interval.

Frank
>
>
>
>
>
>              Frank E Harrell
>              Jr
>              <f.harrell at vander                                          To
>              bilt.edu>                 Thomas Lumley
>              Sent by:                  <tlumley at u.washington.edu>
>              r-help-bounces at st                                          cc
>              at.math.ethz.ch           r-help at stat.math.ethz.ch
>                                                                    Subject
>                                        Re: [R] help comparing two median
>              04/18/2007 05:02          with R
>              AM
>
>
>
>
>
>
>
>
>
> Thomas Lumley wrote:
>> On Tue, 17 Apr 2007, Frank E Harrell Jr wrote:
>>
>>> The points that Thomas and Brian have made are certainly correct, if
>>> one is truly interested in testing for differences in medians or
>>> means.  But the Wilcoxon test provides a valid test of x > y more
>>> generally.  The test is consonant with the Hodges-Lehmann estimator:
>>> the median of all possible differences between an X and a Y.
>>>
>> Yes, but there is no ordering of distributions (taken one at a time)
>> that agrees with the Wilcoxon two-sample test, only orderings of pairs
>> of distributions.
>>
>> The Wilcoxon test provides a test of x>y if it is known a priori that
>> the two distributions are stochastically ordered, but not under weaker
>> assumptions.  Otherwise you can get x>y>z>x. This is in contrast to the
>> t-test, which orders distributions (by their mean) whether or not they
>> are stochastically ordered.
>>
>> Now, it is not unreasonable to say that the problems are unlikely to
>> occur very often and aren't worth worrying too much about. It does imply
>> that it cannot possibly be true that there is any summary of a single
>> distribution that the Wilcoxon test tests for (and the same is true for
>> other two-sample rank tests, eg the logrank test).
>>
>> I know Frank knows this, because I gave a talk on it at Vanderbilt, but
>> most people don't know it. (I thought for a long time that the Wilcoxon
>> rank-sum test was a test for the median pairwise mean, which is actually
>> the R-estimator corresponding to the *one*-sample Wilcoxon test).
>>
>>
>>     -thomas
>>
>
> Thanks for your note Thomas.  I do feel that the problems you have
> rightly listed occur infrequently and that often I only care about two
> groups.  Rank tests generally are good at relatives, not absolutes.  We
> have an efficient test (Wilcoxon) for relative shift but for estimating
> an absolute one-sample quantity (e.g., median) the nonparametric
> estimator is not very efficient.  Ironically there is an exact
> nonparametric confidence interval for the median (unrelated to Wilcoxon)
> but none exists for the mean.
>
> Cheers,
> Frank
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help