[R-sig-ME] Survey weights. Suggestions?

Fri Oct 17 19:18:21 CEST 2014

Perhaps one situation where survey weights could plausibly be used is 
where propensity scores are used as inverse probability weights to 
"create balance" when estimating a treatment effect in non-equivalent 
groups.

On 17/10/2014 16:38, Paul Johnson wrote:
> I'm still resisting the idea that we should incorporate survey weights
> in regression analysis at all, and now it is suggested to me that a
> mixed model with state & city random effects needs to incorporate
> information about survey weights.  Could I hear your opinions?
>
> In the past, I've always answered people who ask for survey weights
> with this quotation:
>
> Murray Aitkin, Brian Francis, John Hinde, and Ross Darnell,
> Statistical Modeling in R (Oxford 2009), p. 112
>
> "One point which often causes confusion is the use of 'sample weights'
> in regression. Survey studies sometimes substantially over-sample
> small strata or sub-populations to provide sample sizes similar to
> those from (under-sampled) large sub-populations.  A 'sample weight'
> is often provided for each observation in the sample data set to allow
> the re-aggregation of the final model to provide population
> predictions. The sample weight is the reciprocal of the probability of
> inclusion in the sample of an observation from each sub-population.
> The sample weight will be high for the large sub-populations, and low
> for the small sub-populations.
>
> These weights can be used formally to define a weighted or pseudo
> likelihood for the sample wieght w_i for y_i, the weighted likelihood
> is
> [formula]
> Then the weighted MLEs from the score equation satisfy
> [formula]
> If theta is the population mean and the model for Y is N(mu,
> sigma-squared), the weighted MLE is [formula]. This correctly weights
> for disproportionate sampling.
>
> However, it is an important point that these sample weights should
> *not* be used as formal weights in a regression analysis: the
> observations should be equally weighted (i.e., unweighted) in the
> analysis, and the model should always include the stratifying factor,
> together with its interactions with other variables in the model....
> "
>
> It appears to me that is correct. I like the argument. It fits with my
> understanding of
>
> DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights
> in Multiple Regression Analyses of Stratified Samples. Journal of the
> American Statistical Association, 78(383), 535. doi:10.2307/2288115.
> Summary: If you have a model specified correctly, you don't need
> sampling weights. If the usage of weights leads to a different answer,
> your model is probably wrong to start with. They make a specification
> test out of the difference.
>
> And then there's the all time classic comment "Survey weighting is a
> mess." (Gelman, A. (2007). Struggles with Survey Weighting and
> Regression Modeling. Statistical Science, 22(2), 153–164.
> doi:10.1214/088342306000000691.
> http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf)
>
> I've read Thomas Lumley's book on using the R survey package, I
> understand how I could estimate some GLM with survey weights.  But I
> don't understand why I'd want to do that.  And I can't bring myself to
> believe that weights can correct for non-response in panel studies
> either.
>
> I need to read something at the middle level, between a graduate
> math-stats book on sampling theory, and a manual for SPSS users that
> tells them which buttons to push.  Can you point me at some discussion
> of where survey weights feed into a random effects framework, or good
> reasons why we need to use survey weights at all?
>
> Please note, I'm not reluctant about weights as an approach to
> heteroskedasticity (WLS), I understand that part.  I believe I
> understand the role of the weights argument in lmer as currently
> presented.
>
> pj
>