[R] svydesign syntax

Thomas Lumley tlumley at u.washington.edu
Fri Jul 30 23:03:29 CEST 2010


On Thu, 22 Jul 2010, R user wrote:

> This message is for those familiar with the survey package. I need to fit a
> weighted Cox model to accommodate the sampling weights as I have a
> case-control study with controls sampled at random from a database in a
> ratio 2:1 to cases (whom were all sampled). I want to make sure I am using
> the right svydesign syntax to specify this sampling design. Can anyone
> please check if the statement below is appropriate for my design?
>
> #group represents the case (total of 132) vs control (253 out of the total
> of 853 controls) groups; prob is 1 for cases and 253/853 for controls and
> ssize=132 for cases and 853 otherwise;
>
> dstr=svydesign(id=~1, strata=~group, prob=~prob, fpc=~ssize, data=noNA)
>

This is technically correct but probably not for what you want.  You probably want

dstr=svydesign(id=~1, strata=~group, prob=~prob,  data=noNA)
or
dstr = twophase(id=list(~1,~1), strata=list(NULL, ~group), data=noNA)

Your svydesign() call treats the database as the full population.  This could be correct, but usually people want estimates for the 'superpopulation' from which the population was sampled.  The first option above is very slightly conservative, the second describes the two phases of sampling that give first the whole database and then your subsample.

    -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle



More information about the R-help mailing list