[R-sig-ME] Survey weights. Suggestions?

Fri Oct 17 17:38:54 CEST 2014

I'm still resisting the idea that we should incorporate survey weights
in regression analysis at all, and now it is suggested to me that a
mixed model with state & city random effects needs to incorporate
information about survey weights.  Could I hear your opinions?

In the past, I've always answered people who ask for survey weights
with this quotation:

Murray Aitkin, Brian Francis, John Hinde, and Ross Darnell,
Statistical Modeling in R (Oxford 2009), p. 112

"One point which often causes confusion is the use of 'sample weights'
in regression. Survey studies sometimes substantially over-sample
small strata or sub-populations to provide sample sizes similar to
those from (under-sampled) large sub-populations.  A 'sample weight'
is often provided for each observation in the sample data set to allow
the re-aggregation of the final model to provide population
predictions. The sample weight is the reciprocal of the probability of
inclusion in the sample of an observation from each sub-population.
The sample weight will be high for the large sub-populations, and low
for the small sub-populations.

These weights can be used formally to define a weighted or pseudo
likelihood for the sample wieght w_i for y_i, the weighted likelihood
is
[formula]
Then the weighted MLEs from the score equation satisfy
[formula]
If theta is the population mean and the model for Y is N(mu,
sigma-squared), the weighted MLE is [formula]. This correctly weights
for disproportionate sampling.

However, it is an important point that these sample weights should
*not* be used as formal weights in a regression analysis: the
observations should be equally weighted (i.e., unweighted) in the
analysis, and the model should always include the stratifying factor,
together with its interactions with other variables in the model....
"

It appears to me that is correct. I like the argument. It fits with my
understanding of

DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights
in Multiple Regression Analyses of Stratified Samples. Journal of the
American Statistical Association, 78(383), 535. doi:10.2307/2288115.
Summary: If you have a model specified correctly, you don't need
sampling weights. If the usage of weights leads to a different answer,
your model is probably wrong to start with. They make a specification
test out of the difference.

And then there's the all time classic comment "Survey weighting is a
mess." (Gelman, A. (2007). Struggles with Survey Weighting and
Regression Modeling. Statistical Science, 22(2), 153–164.
doi:10.1214/088342306000000691.
http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf)

I've read Thomas Lumley's book on using the R survey package, I
understand how I could estimate some GLM with survey weights.  But I
don't understand why I'd want to do that.  And I can't bring myself to
believe that weights can correct for non-response in panel studies
either.

I need to read something at the middle level, between a graduate
math-stats book on sampling theory, and a manual for SPSS users that
tells them which buttons to push.  Can you point me at some discussion
of where survey weights feed into a random effects framework, or good
reasons why we need to use survey weights at all?

Please note, I'm not reluctant about weights as an approach to
heteroskedasticity (WLS), I understand that part.  I believe I
understand the role of the weights argument in lmer as currently
presented.

pj
-- 
Paul E. Johnson
Professor, Political Science      Acting Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu