[R] CI for the median difference

Sat Feb 18 13:49:18 CET 2012

It seems to me that your two approaches are calculating CIs for 
different quantities.  The bootstrap methods are calculating a CI for 
the difference in medians, while the Wilcoxon approach is calculating a 
CI for the median of the differences.  If this were the mean, those 
would be the same, but not for the median:

A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381)

 > median(A)-median(B)
[1] 320
 > median(A-B)
[1] 273

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

On 02/18/2012 05:05 AM, Vittorio Colagrande wrote:
> Dear R-group,
>
>
>
> I have run into a problem in estimating confidence intervals for the median difference.
>
> I want to establish a confidence interval at (1- alpha) level for the difference between the
>
> medians of two indipendent samples (size n and m), by using the Wilcoxon distribution
>
> or with bootstrap methods.
>
> First method, we consider the z matrix of d=nâˆ™m differences of the first and second sample
>
> data and we order these differences in a y vector. By the Wilcoxon distribuition (W) we
>
> determine the q quantile such that Prob(W<q)= alpha/2, inf and sup of confidence interval are
>
> respectively the q-th and (d-q+1)-th elements of the y vector, while we obtain the difference
>
> median by the y distribution. This method is also used to establish the CI by wilcox.test.
>
> Example. Two indipendent sample A and B (n=11, m=13)  of CD4 count cells (T-helper cells):
>
> A =c(619, 600, 490, 1076, 654, 955, 563, 955, 827, 873, 1253)
>
> B =c(346, 507, 598, 228, 576, 338, 1153, 354, 560, 517, 381, 415, 626)
>
>
>
> 1)      CI 95% by matrix z and y vector:
>
> n=length(A)
>
> m=length(B)
>
> d=n*m
>
> z=matrix(0,m,n)
>
> for(j in seq_len(n))
>
> z[, j]=A[j] - B
>
> y=sort(as.vector(z))
>
> q=qwilcox(0.05/2,n,m,lower.tail = TRUE, log.p = FALSE)
>
> inf=y[q]
>
> sup=y[d-q+1]
>
> med=median(y)
>
> results:  inf = 100, sup = 516 and med = 300.
>
>
>
> 2)      CI 95% by wilcox.test:
>
>        I=wilcox.test(A,B,conf.lev=0.95,conf.int=TRUE,exact=F,correct=T)
>
>       inf=I$conf.int[1]
>
>       sup=I$conf.int[2].
>
> results:  inf = 99.9, sup = 516.
>
>
>
> Second method, bootstrap each sample separately, creating the sampling distribution for
>
> each median. Then calculate the difference between the two medians, and create the
>
> sampling distribution of those differences. This is the sampling distribution we care about.
>
> Once we have that distribution we can establish a confidence interval. Some CI 95%,
>
> with reference to the CD4 example, one given below.
>
>
>
> 1)     First procedure (package boot):
>
> library(boot)
>
>        n=length(A)
>
>        m=length(B)
>
>        y=c(A,B)
>
>       camp=data.frame(group=rep(c(1,2),c(n,m)),y)
>
>       dif.median=function(data,i) {
>
>       d=data[i,]
>
>       n1=n+1
>
>       m1=n+m
>
>       median(d$y[1:n])-median(d$y[n1:m1])  }
>
>       dif.boot=boot(camp,dif.median,R=10000, strata=camp$group)
>
>       boot.ci(dif.boot, conf =0.95, type="bca")
>
> results:  inf = 59, sup = 574.
>
>
>
> 2)      Second procedure (package pairwiseCI):
>
> library(pairwiseCI)
>
> MedDiff=Median.diff(A, B, conf.level=0.95, alternative="two.sided",R=10000)
>
> MedDiff$conf.int
>
> MedDiff$estimate
>
> results:  inf = 56, sup = 574, median=320
>
>
>
> 3)      Third procedure (stratified bootstrap):
>
>        dif<- numeric(10000)
>
>        for(i in seq_len(10000))
>
>        dif[i]<- median(sample(A, replace=TRUE)) - median(sample(B, replace=TRUE))
>
>        quantile(dif,prob=c(0.5,(1-0.95)/2,(1-(1-0.95)/2)))
>
> results: inf = 56, sup = 574, median = 313.
>
>
>
> 4)      Fourth procedure (package simpleboot)
>
>   library(simpleboot)
>
>   boot_diff<- two.boot(A, B, median, R = 10000)
>
>   boot.ci(boot_diff,conf=0.95,type="bca")
>
> results: inf = 59, sup = 574.
>
>
>
> The bootstrap procedures do get the same results, but the confidence intervals are
>
> significantly different from those obtained using the method that refers to the Wilcoxon
>
> distribution.
> Problem: does this difference depend on really "different" methods
>
> or on incorrect implementation of the bootstrap technique?
>
>
>
> I will greatly appreciate any clarification you could provide.
>
> Best regards.
>
> Vittorio Colagrande
>
> 	[[alternative HTML version deleted]]
>