[R-sig-ME] Survey weights. Suggestions?
Paul Johnson
pauljohn32 at gmail.com
Fri Oct 17 17:38:54 CEST 2014
I'm still resisting the idea that we should incorporate survey weights
in regression analysis at all, and now it is suggested to me that a
mixed model with state & city random effects needs to incorporate
information about survey weights. Could I hear your opinions?
In the past, I've always answered people who ask for survey weights
with this quotation:
Murray Aitkin, Brian Francis, John Hinde, and Ross Darnell,
Statistical Modeling in R (Oxford 2009), p. 112
"One point which often causes confusion is the use of 'sample weights'
in regression. Survey studies sometimes substantially over-sample
small strata or sub-populations to provide sample sizes similar to
those from (under-sampled) large sub-populations. A 'sample weight'
is often provided for each observation in the sample data set to allow
the re-aggregation of the final model to provide population
predictions. The sample weight is the reciprocal of the probability of
inclusion in the sample of an observation from each sub-population.
The sample weight will be high for the large sub-populations, and low
for the small sub-populations.
These weights can be used formally to define a weighted or pseudo
likelihood for the sample wieght w_i for y_i, the weighted likelihood
is
[formula]
Then the weighted MLEs from the score equation satisfy
[formula]
If theta is the population mean and the model for Y is N(mu,
sigma-squared), the weighted MLE is [formula]. This correctly weights
for disproportionate sampling.
However, it is an important point that these sample weights should
*not* be used as formal weights in a regression analysis: the
observations should be equally weighted (i.e., unweighted) in the
analysis, and the model should always include the stratifying factor,
together with its interactions with other variables in the model....
"
It appears to me that is correct. I like the argument. It fits with my
understanding of
DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights
in Multiple Regression Analyses of Stratified Samples. Journal of the
American Statistical Association, 78(383), 535. doi:10.2307/2288115.
Summary: If you have a model specified correctly, you don't need
sampling weights. If the usage of weights leads to a different answer,
your model is probably wrong to start with. They make a specification
test out of the difference.
And then there's the all time classic comment "Survey weighting is a
mess." (Gelman, A. (2007). Struggles with Survey Weighting and
Regression Modeling. Statistical Science, 22(2), 153–164.
doi:10.1214/088342306000000691.
http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf)
I've read Thomas Lumley's book on using the R survey package, I
understand how I could estimate some GLM with survey weights. But I
don't understand why I'd want to do that. And I can't bring myself to
believe that weights can correct for non-response in panel studies
either.
I need to read something at the middle level, between a graduate
math-stats book on sampling theory, and a manual for SPSS users that
tells them which buttons to push. Can you point me at some discussion
of where survey weights feed into a random effects framework, or good
reasons why we need to use survey weights at all?
Please note, I'm not reluctant about weights as an approach to
heteroskedasticity (WLS), I understand that part. I believe I
understand the role of the weights argument in lmer as currently
presented.
pj
--
Paul E. Johnson
Professor, Political Science Acting Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
More information about the R-sig-mixed-models
mailing list