[R] Multistage Sampling

Fri Jul 7 21:05:29 CEST 2006

Dear WizaRds, dear Thomas,

    First of all, I want to tell you how grateful I am for all your 
support. I wish I will be able to help others along one day the same way 
you do. Thank you so much. I am struggling with a multistage sampling 
design:

library(survey)
multi3  <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, 
1,2,3, 1,2),
nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), 
M=rep(23,9),
y=c(23,33,77,25, 35,74,27, 37,72) )

dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3)
svymean (~y, dmulti3)
    mean     SE
y 45.796 5.5483

svytotal(~y, dmulti3)
  total    SE
y 78999 13643

and I estimate the population total as N=M/m sum(Nl) = 
23/3*(100+50+75)=1725. With this, my variance estimator is:
y1<-mean(multi3$y[1:4]) # 39.5
y2<-mean(multi3$y[5:7]) # 45.33
y3<-mean(multi3$y[8:9]) # 54.5

yT1<-100*y1 # 3950 total cluster 1
yT2<-50*y2 # 2266.67 total cluster 2
yT3<-75*y3 # 4087.5 total cluster 3
ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722
s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1
s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2
s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3

var.yT <- 23^2*( 20/23*1/6*sum( 
(yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) +
1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517

but
var.yT/1725^2 = 81.36157
SE = 9.02006,
but it should be SE=13643/1725=7.90899

Is this calculation correct? I remember svytotal using a different 
variance estimator compared to svymean, and that svytotal gives the 
unbiased estimation. To solve the problem, I went ahead and tried to 
calibrate the design object, telling Survey the population total N=1725:

dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725)
svymean (~y, dmulti3.cal)
    mean     SE
y 45.796 5.5483

svytotal(~y, dmulti3.cal)
  total     SE
y 78999 9570.7

, which indeed gives me the computed svymean SE, but alas, I still don't 
know why my variance is so different. I think it might have sthg to do 
with a differently computed N and the fact that your estimator formula 
is a different one. Since I calculated the Taylor Series solution, i 
suppose there must be another approach? The calibration help page tells 
me to enter a list of population total vectors for each cluster, which 
would result in:

dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75))
Error in regcalibrate.survey.design2(design, formula, population, 
aggregate.stage = aggregate.stage,  :
Population and sample totals are not the same length.

I am very grateful for your help and wish you alle the best
Yours
mark