[R] Multistage Sampling

Fri Jul 7 21:17:40 CEST 2006

>From: Mark Hempelmann <e.rehak at t-online.de>
>Date: Fri Jul 07 14:05:29 CDT 2006
>To: r-help at stat.math.ethz.ch
>Subject: [R] Multistage Sampling

i also find it an truly amazing group also. the general kindness
and generosity of everyone is beyond belief.  it will be a long time coming but i hope i can help some day also. unfortunately,
i can't help you with your question either.

                                           mark

>Dear WizaRds, dear Thomas,
>
>    First of all, I want to tell you how grateful I am for all your 
>support. I wish I will be able to help others along one day the same way 
>you do. Thank you so much. I am struggling with a multistage sampling 
>design:
>
>library(survey)
>multi3  <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, 
>1,2,3, 1,2),
>nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), 
>M=rep(23,9),
>y=c(23,33,77,25, 35,74,27, 37,72) )
>
>dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3)
>svymean (~y, dmulti3)
>    mean     SE
>y 45.796 5.5483
>
>svytotal(~y, dmulti3)
>  total    SE
>y 78999 13643
>
>and I estimate the population total as N=M/m sum(Nl) = 
>23/3*(100+50+75)=1725. With this, my variance estimator is:
>y1<-mean(multi3$y[1:4]) # 39.5
>y2<-mean(multi3$y[5:7]) # 45.33
>y3<-mean(multi3$y[8:9]) # 54.5
>
>yT1<-100*y1 # 3950 total cluster 1
>yT2<-50*y2 # 2266.67 total cluster 2
>yT3<-75*y3 # 4087.5 total cluster 3
>ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722
>s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1
>s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2
>s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3
>
>var.yT <- 23^2*( 20/23*1/6*sum( 
>(yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) +
>1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517
>
>but
>var.yT/1725^2 = 81.36157
>SE = 9.02006,
>but it should be SE=13643/1725=7.90899
>
>Is this calculation correct? I remember svytotal using a different 
>variance estimator compared to svymean, and that svytotal gives the 
>unbiased estimation. To solve the problem, I went ahead and tried to 
>calibrate the design object, telling Survey the population total N=1725:
>
>dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725)
>svymean (~y, dmulti3.cal)
>    mean     SE
>y 45.796 5.5483
>
>svytotal(~y, dmulti3.cal)
>  total     SE
>y 78999 9570.7
>
>, which indeed gives me the computed svymean SE, but alas, I still don't 
>know why my variance is so different. I think it might have sthg to do 
>with a differently computed N and the fact that your estimator formula 
>is a different one. Since I calculated the Taylor Series solution, i 
>suppose there must be another approach? The calibration help page tells 
>me to enter a list of population total vectors for each cluster, which 
>would result in:
>
>dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75))
>Error in regcalibrate.survey.design2(design, formula, population, 
>aggregate.stage = aggregate.stage,  :
>Population and sample totals are not the same length.
>
>I am very grateful for your help and wish you alle the best
>Yours
>mark
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html