[R-sig-ME] Assumptions of random effects for unbiased estimates

Poe, John jdpo223 at g.uky.edu
Wed Oct 12 04:47:41 CEST 2016


Thanks Jake!

On Oct 11, 2016 9:50 PM, "Jake Westfall" <jake.a.westfall at gmail.com> wrote:

> What a nice contribution from John!
>
> Jake
>
> On Tue, Oct 11, 2016 at 8:11 PM, Poe, John <jdpo223 at g.uky.edu> wrote:
>
> > My reading of modern work by panel data econometricians is that they seem
> > very fine with the use of mixed effects models that properly
> differentiate
> > effects at different levels of analysis and the tools to do so have
> existed
> > in that literature since the early 1980s. They have been borrowing
> heavily
> > from the mixed effects literature in designing econometric models and
> talk
> > about them in panel data textbooks. This hasn't typically filtered down
> to
> > applied economists who tend to misunderstand what other fields do because
> > other fields just tend to talk about them differently.
> >
> > The short version:
> > Everyone in the mixed effects literature just uses group/grand mean
> > centering and random coefficients to deal with endogeneity bias. If you
> are
> > an economist and someone outside of econ says mixed effects models you
> > should think *correlated random effects models* and not *random effects
> > models*.
> >
> > The long version:
> > Economists are pretty afraid error structures that are correlated with
> > independent variables in general and have built up pretty elaborate
> > statistical models to deal with the problem. In panel data, this
> manifests
> > itself as wanting to avoid confounding effects at different levels of
> > analysis so that within group varying effects are segregated from between
> > group varying effects. It can also happen when you are omitting higher
> > level random effects
> > <http://methods.johndavidpoe.com/2016/09/09/independence-
> across-levels-in-mixed-effects-models/>
> > and they are distorting the structure of the random effects that you are
> > including. This is generally a good thing as you want to be able to test
> > hypotheses at specific levels of analysis without confounding.
> >
> > It's a big enough theoretical concern in the discipline that they usually
> > just want to remove all between group effects from the data as a
> *default* to
> > get level one effects because it is simpler and more fool proof than
> > dealing with the problem in a mixed effects setting. It's so pervasive
> that
> > they are often socialized into not designing hypotheses for any between
> > group or cross-level variation and just focus on within group (time
> > varying) variability when at all possible (what economists call *within
> > effects*).
> >
> > What economists refer to as fixed effects models just difference out all
> > between group variation so that it cannot contaminate within group
> effects
> > (bias level one coefficients). It's the equivalent to including group
> > indicator variables in the model instead of a random effect and just
> > accepting that you can't make substantive inferences about anything at
> the
> > group level (what economists call *between effects*).
> >
> > The typical conventional wisdom in applied econometrics is to use a
> > Hausman test which is a generic test comparing coefficients between a
> > random effects model (with no level 2 covariates) and a model with all
> > between group variability removed from the data. If there are differences
> > between the two, then they prefer to go with the latter. This is bad
> > practice according to econometrics textbooks but applied people don't
> seem
> > to care (Baltagi 2013 ch 4.3). This only makes sense if you don't care
> > about group invariant variables that only differ crosssectionally and/or
> > you think of their effects as contamination. Panel data econometrics
> > textbooks tend to argue for a wider range of options here but in practice
> > not that many economists seem to use them.
> >
> > There's an alternative framework in econ for dealing with this problem
> > that they call a Mundlak device (Mundlak 1978) or correlated random
> effects
> > models (Baltagi Handbook of Panel Data 2014 ch 6.3.3 or really any panel
> > data textbook) which is equivalent to a hierarchical linear model with
> > group mean centering for level-one variables. This approach is used in
> > econometrics by some pretty standard advanced panel data models (e.g.
> > Hausman-Taylor and Arellano Bond). The other alternative that is
> advocated
> > by panel data econometricians but doesn't seem to have filtered down to
> > rank and file economists is to use random coefficients models and just
> > allow the random effects to be correlated with level one variables (Hsiao
> > 2014 chapter 6 and most of his other written work).
> >
> > It is important to understand that efficiency isn't the primary reason
> for
> > use of a mixed effects model over a fixed effects model for most
> research.
> > A common reason to use a mixed effects model is that you have hypotheses
> > about variables operating at higher levels of analysis or cross-level
> > interactions and those questions cannot be answered by fixed effects
> panel
> > models that have removed all between group variability from the analysis.
> > You are sacrificing the ability to test group variant hypotheses by
> using a
> > basic fixed effects model over a mixed effects model. For nonlinear
> models
> > like a logistic regression it can also be very difficult to use an
> unbiased
> > fixed effects model (though there are ways in a panel setting e.g. Hahn
> and
> > Newy 2004) and trivial to use a mixed effects model.
> >
> > Panel data econometricians almost always talk about typical practice
> among
> > applied economists using fixed effects as flawed (see Baltagi 2013 ch.
> > 4.3). Mark Nerlov's 2000 History of Panel Data Econometrics is my
> favorite
> > example:
> >
> > The absurdity of the contention that possible correlation between some of
> >> the observed explanatory variables and the individual-specific
> component of
> >> the disturbance is a ground for using fixed effects should be clear from
> >> the following example: Consider a panel of households with data on
> >> consumption and income. We are trying to estimate a consumption
> function.
> >> Income varies across households and over time. The variation across
> >> households is related to ability of the main earner and other household
> >> specific factors which vary little over time, that is to say, reflect
> >> mainly differences in permanent income. Such permanent differences in
> >> income are widely believed to be the source of most differences in
> >> consumption both crosssectionally and over time, whereas, variations of
> >> income over time are likely to be mostly transitory and unrelated to
> >> consumption in most categories. Yet, fixed-effects regressions are
> >> equivalent to using only this variation and discarding the information
> on
> >> the consumption-income relationship contained the cross-section
> variation
> >> among the household means.
> >
> >
> > See the last couple of pages of this lecture
> > <http://www.johndavidpoe.com/wp-content/uploads/2012/09/
> Blalock-Lecture.pdf> for
> > the citations in the econometrics and multilevel literature that I
> > referenced.
> >
> >
> >
> > On Tue, Oct 11, 2016 at 3:32 PM, Jake Westfall <
> jake.a.westfall at gmail.com>
> > wrote:
> >
> >> Hi Laura and Ben,
> >>
> >> I like this paper on this topic:
> >> http://psych.colorado.edu/~westfaja/FixedvsRandom.pdf
> >>
> >> What it comes down to essentially is that if the cluster effects are
> >> correlated with the "time-varying" (i.e., within-cluster varying) X
> >> predictor -- so that, for example, some clusters have high means on X
> and
> >> others have low means on X -- then there is the possibility that the
> >> average within-cluster effect (which is what the fixed effect model
> >> estimates) differs from the overall effect of X, not conditional on the
> >> clusters. An extreme example of this is Simpson's paradox. Now since the
> >> estimate from the random-effects model can be seen as a weighted average
> >> of
> >> these two effects, it will generally be pulled to some extent away from
> >> the
> >> fixed-effect estimate toward the unconditional estimate, which is the
> bias
> >> that econometricians fret about. However, if the cluster effects are not
> >> correlated with X, so that each cluster has the same mean on X, then
> this
> >> situation is not possible, so the random-effect model will give the same
> >> unbiased estimate as the fixed-effect model.
> >>
> >> A simple solution to this problem is to retain the random-effect model,
> >> but
> >> to split the predictor X into two components, one representing the
> >> within-cluster variation of X and the other representing the
> >> between-cluster variation of X, and estimate separate slopes for these
> two
> >> effects. One can even test whether these two slopes differ from each
> >> other,
> >> which is conceptually similar to what the Hausman test does. As
> described
> >> in the paper linked above, the estimate of the within-cluster component
> of
> >> the X effect equals the estimate one would obtain from a fixed-effect
> >> model.
> >>
> >> As for the original question, I can't speak for common practice in
> >> ecology,
> >> but I suspect it may be like it is in my home field of psychology, where
> >> we
> >> do worry about this issue (to some extent), but we discuss it using
> >> completely different language. That is, we discuss it in terms of
> whether
> >> there are different effects of the predictor at the within-cluster and
> >> between-cluster levels, and how our model might account for that.
> >>
> >> Jake
> >>
> >> On Tue, Oct 11, 2016 at 1:50 PM, Ben Bolker <bbolker at gmail.com> wrote:
> >>
> >> >
> >> >   I didn't respond to this offline, as it took me a while even to
> start
> >> > to come up to speed on the question.  Random effects are indeed
> defined
> >> > from *very* different points of view in the two communities
> >> > ([bio]statistical vs. econometric); I'm sure there are points of
> >> > contact, but I've been having a hard time getting my head around it
> all.
> >> >
> >> > Econometric definition:
> >> >
> >> > The wikipedia page <https://en.wikipedia.org/
> wiki/Random_effects_model>
> >> > and CrossValidated question
> >> > <http://stats.stackexchange.com/questions/66161/why-do-
> >> > random-effect-models-require-the-effects-to-be-uncorrelated-
> >> with-the-inpu>
> >> > were both helpful for me.
> >> >
> >> >  In the (bio)statistical world fixed and random effects are usually
> >> > justified practically in terms of shrinkage estimators, or
> >> > philosophically in terms of random draws from an exchangeable set of
> >> > levels: e.g. see
> >> > <http://stats.stackexchange.com/questions/4700/what-is-
> >> > the-difference-between-fixed-effect-random-effect-and-mixed-
> >> effect-mode/>
> >> > for links.
> >> >
> >> >   I don't think I can really write an answer yet.  I'm still trying to
> >> > understand at an intuitive or heuristic level what it means for
> >> > Cov(x_it,c_i)=0, where x_it is a set of explanatory variables over
> time
> >> > for an individual subject and c_i is the conditional mode (=BLUP in
> >> > linear mixed-model-land) for the deviation of the individual i from
> the
> >> > population mean ... or more particularly what it means for that
> >> > condition to be violated, which is the point at which fixed effects
> >> > would become preferred.
> >> >
> >> >   As a side note, some statisticians (Andrew Gelman is the one who
> >> > springs to mind) have commented on the possible overemphasis on bias.
> >> > (All else being equal unbiased estimators are preferred to biased
> >> > estimators but all else is not always equal). Two examples: (1)
> >> > penalized estimators such as lasso/ridge regression (closely related
> to
> >> > mixed models) give biased parameter estimates with lower mean squared
> >> > error. (2) When estimating variability, one has to choose a particular
> >> > scale (variance, standard error, log(standard error), etc.) on which
> one
> >> > would prefer to get an unbiased answer.
> >> >
> >> > On 16-10-11 12:02 PM, Laura Dee wrote:
> >> > > Dear all,
> >> > > Random effects are more efficient estimators – however they come at
> >> the
> >> > > cost of the assumption that the random effect is not correlated with
> >> the
> >> > > included explanatory variables. Otherwise, using random effects
> leads
> >> to
> >> > > biased estimates (e.g., as laid out in Woolridge
> >> > > <https://faculty.fuqua.duke.edu/~moorman/Wooldridge,%20FE%20
> >> and%20RE.pdf
> >> > >'s
> >> > > Econometrics text). This assumption is a strong one for many
> >> > > observational datasets, and most analyses in economics do not use
> >> random
> >> > > effects for this reason. *Is there a reason why observational
> >> ecological
> >> > > datasets would be fundamentally different that I am missing? Why is
> >> this
> >> > > important assumption (to have unbiased estimates from random
> effects)
> >> > > not emphasized in ecology? *
> >> > >
> >> > > Thanks!
> >> > >
> >> > > Laura
> >> > >
> >> > > --
> >> > > Laura Dee
> >> > > Post-doctoral Associate
> >> > > University of Minnesota
> >> > > ledee at umn.edu <mailto:ledee at umn.edu>
> >> > > lauraedee.com <http://lauraedee.com>
> >> >
> >> > _______________________________________________
> >> > R-sig-mixed-models at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> >
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >
> >
> >
> > --
> >
> >
> >
> >
> > Thanks,
> > John
> >
> >
> > John Poe
> > Doctoral Candidate
> > Department of Political Science
> > Research Methodologist
> > UK Center for Public Health Services & Systems Research
> > University of Kentucky
> > 111 Washington Avenue, Room 203a
> > Lexington, KY 40536
> > www.johndavidpoe.com
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list