[R] Wilcoxon signed rank test and its requirements

Fri Jun 25 20:44:49 CEST 2010

Atte, I would not wonder if you got lost and confused by the certainly
interesting methodological discussion that has been going on in this thread.

Since the helpers do not seem to converge/agree, I propose to you to use a
different nonparametric approach: The bootstrap. The important thing about
the bootstrap is that you do not have to be concerned with the questions
that have been discussed in this thread.

In the bootstrap you draw repeatedly samples with replacement from your data
and compute the statistic you are interested in (for you this is the mean).
The beauty of this approach is i) that the bootstrap distribution is normal
and ii) that you can directly compare the quantiles/confidence intervals of
the bootstrap distribution.

Let's say you have x and y, which both come from Poisson distributions with
relatively low means. Note that this resembles your data in that the
distributions are asymmetric, but contain a considerable number of ties.

#set seed for random number generation
set.seed(123)

#simulate x and y (these would be your data)
x=rpois(100,3)
y=rpois(100,4)

#plot histograms for x and y
par(mfcol=c(1,2))
hist(x,breaks=length(unique(x)))
hist(y,breaks=length(unique(y))) 

Now we sample with replacement from x and y (i.e., we draw one observation
from x and one from y, and afterwards we put the drawn observation back into
x and y, respectively). For each bootstrap of x and y, respectively, we
sample exactly as many observations as there are in x and y, respectively
(here 100). We then compute the statistic of interest of this bootstrap
(here the mean). We repeat this process many times (here 1000).

n=1000 #number of bootstraps to draw
x.boot1=numeric(n)
y.boot1=numeric(n)
for(i in 1:1000){
  x.boot1[i]=mean(sample(x,length(x),replace=T))
  y.boot1[i]=mean(sample(y,length(y),replace=T))
} 

Doing this, we draw the bootstrap distribution of the mean of x and y,
respectively. Note that the bootstrap distribution is normally distributed
and unbiased (the latter automatically because we bootstrap the mean):

par(mfcol=c(1,2))
hist(x.boot1)
hist(y.boot1)

The simple(st) way of comparing these distributions is by checking whether
their confidence intervals overlap or not. You get the 95-percent confidence
intervals by

quantile(x.boot1,p=c(0.025,0.975))
quantile(y.boot1,p=c(0.025,0.975))

If they do not overlap, you would conclude that they are significantly
different. In the one-sample case, you would just compare whether value of
interest is within or outside the confidence interval.

Finally, note that the little loop that we have programmed to draw the
bootstraps are already implemented in an R package. Using the bootstrap
package, you could draw the bootstraps analogously by:

library(bootstrap)
x.boot2=bootstrap(x,nboot=1000,mean)
y.boot2=bootstrap(y,nboot=1000,mean)

The bootstrapped means are then stored in x.boot2$thetastar and
y.boot2$thetastar.

Hope that helps,
Daniel

This process we repeatAnd now we draw many bootstraps, r
-- 
View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2268801.html
Sent from the R help mailing list archive at Nabble.com.