[R] Problem with code for bootstrapping chi square test with count data

Tue Nov 20 22:01:41 CET 2007

On Tue, 20 Nov 2007, Geertje Van der Heijden wrote:

> Hi,
>
> I'd like some advice on bootstrapping in R.
>
> I have a species x with 20 individuals and a factor containing 0 and 1's
> (in this case 5 zeros and 15 ones). I want to compare the frequency of
> the occurrence of 1 with a probability value. This code seems to work to
> do this in R.
>
> attach(test)
> p <- c(0.5272, (1-0.5272))
> sp1_1 <- length(subset(x, x==1))
> sp1_0 <- length(subset(x, x==0))
> obs1_1 <- c(sp1_1, sp1_0)
> chisq.test(obs1_1, p=p)
>
> However, I'd like to bootstrap these 20 individuals to produce a whole
> population of samples and I'd like to do a chi-square test for each of
> the bootstrap sample to create a distribution of the chi-square
> statistic.

And what do you want to do with that distribution?  It is 'a 
distribution', but it is not obviously connected with the test you say you 
want to do.  It seems to me that the real issue here is understanding the 
applicability of the bootstrap.

We can make your example reproducible by

x <- c(rep(0,5), rep(1, 15))
N <- sum(x) # 15
p <- c(0.5272, (1-0.5272))
chisq.test(c(N, 20-N), p=p)

> I have bootstrapped the 0's and 1's of x 20 times using the following
> code:
>
> resamples <- lapply(1:20, function(i) sample(x, replace=T))
>
> What I can't get to work is how to calculate the observed values for 1's
> and 0's in each of the bootstrap samples, which I need to do a
> chi-square test for each sample. The methd I used above doesn't seem to
> work the results for resamples. Does anyone have an idea on how to get
> this to work? Or is there another easier way to do this? I hope it is
> clear what I am trying to do!

You can make this easier by noticing that the number of ones will be 
binomial(20, 15/20).  So e.g.

res <- replicate(1000, {N <- rbinom(1, 20, 15/20);
                         chisq.test(c(N, 20-N), p=p)$statistic})

Now what are you going to do with this?

It is not simulation under the null hypothesis, which would be

res <- replicate(1000, {N <- rbinom(1, 20, 0.5272);
                         chisq.test(c(N, 20-N), p=p)$statistic})
mean(res > 3.983)

and shows good agreement with the theoretical approximating distribution.

Also, why do you want a chisq test to do this?  We can use
binom.test(15, 20, 0.5272), which is exact.  (The only assumption it makes 
iid trials, which the bootstrap methods are also making.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595