[R] Struggeling with svydesign()

Thu Apr 8 18:30:36 CEST 2010

On Thu, 8 Apr 2010, ONKELINX, Thierry wrote:

> Dear Thomas,
>
> Thank you for your informative answer. We used epi.stratasize() to
> estimate the required sample size per stratum. Notice in the example
> below that it can select a sample size smaller than 2 in the very small
> strata. Would you recommend to sample at least two items per stratum or
> rather to merge some strata a priori until the sample size is at least
> 2?

Merging the strata would be best

> Or is there a better way to estimate the sample size per stratum?
> Note that the stratification only aims to get a good geographical
> coverage (the strata a geographical regions). We are not interested in
> estimates per stratum.
>
> library(epiR)
> N <- c(39, 270, 1060, 1336, 118, 26, 154, 10, 3)
> epi.stratasize(strata.n = N, strata.mean = 0.9, epsilon = 0.05, method =
> "proportion")
> $strata.sample
> [1]  2 15 57 72  6  1  8  1  0
>
> $total.sample
> [1] 162
>
> The probability of sampling was proportional with the area (larger
> polygons are more likely to be selected than smaller ones). So we will
> use weights = I(1/Area), as you suggested.

If you are using probability proportional to size and you want to use finite-population correctsions, you also need to specify the fpc= argument differently. The simplest version is an approximation that uses only the marginal sampling probabilities
   svydesign(id=~1, fpc=~p, pps="brewer", strata=~strat
where p is a variable with the actual sampling probability (not just proportional to sampling probability).

Also, how did you do the sampling?  It's quite hard to do unequal probability sampling without replacement (the R sample() function doesn't actually  do it, though the sampling package does).

     -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle