[R] Survey - Cluster Sampling
Mark Hempelmann
neo27 at t-online.de
Thu Jun 16 14:56:35 CEST 2005
Dear WizaRds,
I am struggling to compute correctly a cluster sampling design. I want
to do one stage clustering with different parametric changes:
Let M be the total number of clusters in the population, and m the
number sampled. Let N be the total of elements in the population and n
the number sampled. y are the values sampled. This is my example data:
clus1 <- data.frame(cluster=c(1,1,1,2,2,2,3,3,3), id=seq(1:3,3),
weight=rep(72/9,9), nl=rep(3,9), Nl=rep(3,9), N=rep(72,9), y=c(23,33,77,
25,35,74, 27,37,72) )
1. Let M=m=3 and N=n=9. Then:
dclus1<-svydesign(id=~cluster, data=clus1)
svymean(~y, dclus1)
mean SE
y 44.778 0.294, the unweighted mean, assuming equal probability in the
clusters. ok.
2. Let M=23, m=3 and N=72, n=9, then I am unable to use svydesign correctly:
dclus2<-svydesign(id=~cluster, data=clus1, fpc=~N)
svymean(~y, dclus2)
mean SE
y 44.778 0.2878, but it should be 23/72 * 1/3(133+134+136)=42.91, since
I have to include the total number of clusters/total population M/N into
the estimator. How can I include the information of the total number of
clusters?
3. How do I work with weights correctly? I understand that weights imply
inverse probability weighting 1/p with p=n/N in simple sampling, in
our case 72/9=8, because I sample 9 units out of a total population of
72. Again, I couldn't tell survey the number of total clusters M. So:
dclus3<-svydesign(id=~cluster, weights=~weight, data=clus1, fpc=~N)
svymean(~y, dclus3)
mean SE
y 44.778 0.2878, still exactly the same numbers, although I provided the
weights. What am I doing wrong?
I am sorry to bother you. Studying Statistics isn't done in a day,
that's for sure. Thank you so much for your understanding and support.
mark
More information about the R-help
mailing list