[R] large survey data set

Tim Churches tchur at optushome.com.au
Fri Jun 28 21:45:00 CEST 2002


Andrew Perrin wrote:
> 
> This is interesting and a bit disturbing. I've been using the weights=
> syntax to assign a case-weighting system in a survey dataset as well. Can
> you send me somewhere for documentation of the differences?

I found this paper by Hendrickx to be a useful, non-technical summary of
the difference between replication weights and probability weights. It
only
very lightly touches on the issue of design effects, which also need to
be 
considered in the context of cluster sampling.

http://www.asc.org.uk/Events/Apr02/Abstract/Hendrickx.htm

Tim C

> 
> Thanks.
> 
> ----------------------------------------------------------------------
> Andrew J Perrin - http://www.unc.edu/~aperrin
> Assistant Professor of Sociology, U of North Carolina, Chapel Hill
> clists at perrin.socsci.unc.edu * andrew_perrin (at) unc.edu
> 
> On Thu, 27 Jun 2002, Thomas Lumley wrote:
> 
> > On Thu, 27 Jun 2002, Andrew Perrin wrote:
> >
> > > The lm function (for linear modelling aka linear regression) includes
> > > case weights with a simple syntax:
> > >
> > > foo<-lm(dependent ~ indep + indep + ... ,
> > >     data = <data object>,
> > >     weights = <weight variable>)
> >
> > Yes, but that isn't what he means by weights...
> >
> > The standard regression weights are variance weights: a weight of 2
> > denotes an observation with half the variance of a weight of 1.
> >
> > In survey sampling (and in related missing data and causal inference
> > models) you need probability weights: a weight of 2 means an observation
> > had half the chance of being sampled.  You get the same regression
> > coefficients (more or less) but quite different standard errors.
> >
> > The `model-robust' sandwich variance estimators give about the right
> > standard errors (as long as the sampling fraction is small). These are
> > built in to the survival models, but not in most other software. They are
> > pretty easy to calculate but with a 20% sample they probably aren't going
> > to work well.
> >
> >
> >       -thomas
> >
> >
> >
> >
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list