[R-sig-ME] Comparing weighted and unweighted estimation RE: Methodological and practical issues about survey weights using lme4

Sun Jul 11 10:36:10 CEST 2021

Dear list members

I am still trying to understand weighed estimation of mixed models. In previous messages I was very kindly told that the weights argument in lmer() is for precision weights (not sampling weights). However, I am still not convinced about that and I would appreciat more thoughts about weighting.

The vignette of WeMix packaged says: “The packagelme4 fits mixed models when there are no weights or weights only for first-level units (Bates, Maechler,Bolker, & Walker, 2015) and is recommended when both of two conditions hold: no weights are above the first level,and cluster-robust standard errors are not required.WeMixcan fit models with weights at every level of the modeland also calculates cluster-robust standard errors that account for covariance between units in the same groups”. See https://cran.r-project.org/web/packages/WeMix/
Additionally, in the help page of the mix() function of WeMix explains: "When all weights above the individual level are 1, this is similar to a lmer and you should use lme4 because it is much faster. "

I have seen explanations of that use, such as in the following link: https://www.r-bloggers.com/2017/06/sampling-weights-and-multilevel-modeling-in-r/

That use is different from the use in meta-analysis, weighting by inverse variance or sample size: https://www.metafor-project.org/doku.php/tips:rma_vs_lm_lme_lmer

There are also links of debates in internet commenting that lmer() cannot be used for survey weights. Some of them are old so I do not summarize them here.

The 'weights' argument in lmer() function of lme4 is explained in the following way: "weights an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector. Prior weights are not normalized or standardized in any way. In particular, the diagonal of the residual covariance matrix is the squared residual standard deviation parameter sigma times the vector of inverse weights. Therefore, if the weights have relatively large magnitudes, then in order to compensate, the sigma parameter will also need to have a relatively large magnitude"

I apologise for my ignorance but I do not understand the difference between precisoon weights or survey weights in this last function. The expression "prior weights" doest not help me with that.

Using the European Social Survey, I have used the “analysis weights” () normalized to sum to the sample size after deletion of missing data.
https://www.europeansocialsurvey.org/methodology/ess_methodology/data_processing_archiving/weighting.html
Using lmer() with weights and WeMix with those weights for level-1 and unitary weights for level-2 produce very similar estimates of level-1-and-2 variables. I use unitary weights for level-2 because they are European countries, therefore it is not a sample such a sample of schools in a country. Example of results for my main variables:

  *   Using lmer() without weights
                                           Estimate             Std. Error          t value
level-2 variable1                0.20795196        0.06229626      3.338113
level-2 variable2                -0.26445932       0.06232801     -4.243025
level-2 variable2                0.46072085         0.05212811      8.838241

  *   Using lmer() and weights
                                           Estimate             Std. Error           t value
level-2 variable1                0.194559163       0.06695520      2.9058113
level-2 variable2                -0.258452710     0.06771138     -3.8169759
level-2 variable2                0.466058046       0.05746252      8.1106439

  *   Using WeMix
                                           Estimate              Std. Error        t value
level-2 variable1                0.1954945           0.0512900       3.8116
level-2 variable2                -0.2593960          0.0585994        -4.4266
level-2 variable2                0.4667014           0.0548841       8.5034

I am surprised by the similarity of results between weighted and unweighted lmer(). The four most populated countries are 22% of the observations in my sample but 60% of the sum of the normalized weights. Therefore, I was expecting more impact of weighting. In any case, comparing weighted lmer() and WeMix´s function, we find similar results.

Apart from the sofware issues, my more general question was methodological. Solon et al (2015) do not suggest using weights for causal analysis. Indeed, their paper starts with a paragraph that is worthy to repeat here: “At the beginning of their textbook’s section on weighted estimation of regression models, Angrist and Pischke (2009, p. 91) acknowledge, “Few things are as confusing to applied researchers as the role of sample weights. Even now, 20 years post- Ph.D., we read the section of the Stata manual on weighting with some dismay.” After years of discussing weighting issues with fellow economic researchers, we know that Angrist and Pischke are in excellent company. In published research, top- notch empirical scholars make conflicting choices about whether and how to weight and often provide little or no rationale for their choices. And in private discussions, we have found that accomplished researchers sometimes own up to confusion or declare demonstrably faulty reasons for their weighting choices.”
http://jhr.uwpress.org/content/50/2/301

Therefore, some of the available discussions in internet are probably wrong. I would appreciate further comments about these issues: 1) convenience of using weights for causal analysis; 2) using survey weights in lme4 pacakge; 3) comparison of weighted and unweighted results in spite of such a difference of the importance of the level-2 units (countries here).

Thank you very much. All the best,

Fernando Bruna

________________________________
De: James Pustejovsky <jepusto using gmail.com>
Enviado: lunes, 5 de julio de 2021 1:02
Para: Fernando Pedro Bruna Quintas <f.bruna using udc.es>
Cc: r-sig-mixed-models using r-project.org <r-sig-mixed-models using r-project.org>
Asunto: Re: [R-sig-ME] Methodological and practical issues about survey weights using lme4

My understanding is that lme4 does not accommodate survey weights. But check out the WeMix package for an alternative: https://cran.r-project.org/package=WeMix

I learned about WeMix when I posted a query on Twitter very similar to your question (https://twitter.com/jepusto/status/1408084119884599299?s=21).

Kind Regards,
James

On Jul 4, 2021, at 11:20 AM, Fernando Pedro Bruna Quintas <f.bruna using udc.es> wrote:

Dear list members,

I have two questions about the use of survey weights in multilevel models.

I have estimated a multilevel model about the effects of individual and cultural variables in well-being, using the European Social Survey. By “culture” I mean national aggregates of my level-1 indicators. Think of Yij as an indicator of wellbeing, Xij as an indicator of being individualistic (for instance) and Xj as the sample country mean of individualism, representing the degree of individualism in a national culture (contextual effect).

I have estimated the following simplified model:
lmer( Yij  ~ (Xij-Xj) + Xj + (1 | country) ) , data=databank)

A referee makes me the following comment: “Post stratication weights at individual level and at the higher-level national variables is a relevant issue. This issue of importance on MLM context as Stata instructions note https://www.stata.com/features/overview/multilevel-models-with-survey-data/. The authors seem to be using R but I would assume it includes similar options to include weights to all levels of the analysis. The literature on MLM includes recommendations for Fitting multilevel models in complex survey data with design weights and this needs to be referred and selection of weights justified.”

The information about weights in the survey I am using is in the following link:
https://www.europeansocialsurvey.org/methodology/ess_methodology/data_processing_archiving/weighting.html

My questions are the following:

 1.  A first questions is about general methodology, though applied to the analysis of the effects of level-two variables. I would appreciate references or links about when is appropriate to use survey weights, depending on the research question. I have data on 23 countries. My goal is to measure the effects of level-two (cultural variables). Therefore, I am not so much interested on concluding about big countries (Russia has 145 million of people!). I need variance to differentiate cultural effects in Belgium, Netherlands... However, If I do not use weights my conclusions are only about the particular sample published by the European Social Survey. Any thought?
 2.  Apart from that, and more generally for any other study, I would appreciate comments and references about using survey weights in lme4. I understand that I would have to change the calculation of all may level-one variables, which are defined as deviations to the national means. Additionally, I must consider reweighting national means of those variables, as well as other level-two variables. The estimation procedure has to be weighted... I would appreciate any practical comment about weighted estimation using survey data and lme4.

Thank you very much,

Fernando Bruna
Department of Economics
University of A Corunha (Coruña), Spain

   [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]