[R] Survey - Cluster Sampling

Thu Jun 16 14:56:35 CEST 2005

Dear WizaRds,

	I am struggling to compute correctly a cluster sampling design. I want
to do one stage clustering with different parametric changes:

Let M be the total number  of clusters in the population, and m the
number sampled. Let N be the total of elements in the population and n
the number sampled. y are the values sampled. This is my example data:

clus1 <- data.frame(cluster=c(1,1,1,2,2,2,3,3,3), id=seq(1:3,3),
weight=rep(72/9,9), nl=rep(3,9), Nl=rep(3,9), N=rep(72,9), y=c(23,33,77,
25,35,74, 27,37,72) )

1. Let M=m=3 and N=n=9. Then:

dclus1<-svydesign(id=~cluster,  data=clus1)
svymean(~y, dclus1)

     mean    SE
y 44.778 0.294, the unweighted mean, assuming equal probability in the
clusters. ok.

2. Let M=23, m=3 and N=72, n=9, then I am unable to use svydesign correctly:

dclus2<-svydesign(id=~cluster,  data=clus1, fpc=~N)
svymean(~y, dclus2)

     mean     SE
y 44.778 0.2878, but it should be 23/72 * 1/3(133+134+136)=42.91, since
I have to include the total number of clusters/total population M/N into
the estimator. How can I include the information of the total number of
clusters?

3. How do I work with weights correctly? I understand that weights imply
  inverse probability weighting 1/p with p=n/N in simple sampling, in
our case 72/9=8, because I sample 9 units out of a total population of
72. Again, I couldn't tell survey the number of total clusters M. So:

dclus3<-svydesign(id=~cluster,  weights=~weight, data=clus1, fpc=~N)
svymean(~y, dclus3)

     mean     SE
y 44.778 0.2878, still exactly the same numbers, although I provided the
weights. What am I doing wrong?

I am sorry to bother you. Studying Statistics isn't done in a day,
that's for sure. Thank you so much for your understanding and support.

mark