[R-sig-ME] Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?

Poe, John jdpo223 at g.uky.edu
Fri Sep 9 20:34:07 CEST 2016

On point 1, depending on the number of sites yes you can use a random
effect instead of a fixed effect to account for omitted variables like the
site selection mechanism.

If you are doing this to control for site effects that are essentially
contamination and of no theoretical interest, then using fixed effects for
site is the easiest approach for a linear model. In most generalized linear
models you can’t effectively difference the fixed effects out of the data
in the same way and including them in the model will result in incidental
parameters bias with as few as ten dummy variables.

If you are interested in understanding how the site related latent variable
might work, then you should use a mixed effects model and be sure to
include group averages for your lower level variables so that you can
interpret the within group and between group effects separately.  You may
also need to model random coefficients because decomposing the variables
doesn’t always completely orthogonalize the within group versions of the
variables and the random effect.

With any random effect you are assuming that it is uncorrelated with fixed
components in the model which means you are modeling the relationship
between the random effect and all of your independent variables regardless
of what you do. You can either take the fixed effects/group indicator
variables approach or the mixed effects modeling approach but in both cases
doing it properly means you have accounted for lack of independence across
variables and within sites.

On Tue, Sep 6, 2016 at 3:41 AM, Chen, Chun <chun.chen at wur.nl> wrote:

> Thank you Ben for the answer. Now I am wondering:
> 1) If I happened to have a grouping variable that is not by design, for
> instance my randomly selected observations turned out to show some site
> related characteristics, is it sound to apply a mixed model including site
> as random intercept? In practice, it is pretty common to use site as a
> fixed effect in the regression analysis (i.e. to detect the main effect
> after adjusting site effect), even site is not a factor in the
> experimental/observational design.
> 2) If site can be used as a random intercept, what is the exact criteria
> for non-independence (i.e. nested structure ) in the context of applying a
> mixed model? Not the same as what you defined below?
> 3) In case site can not be used as random intercept, but can be used as a
> fixed factor: I assume that if a categorical variable can be modeled as a
> fixed effect, it can also be modeled as random effect (both are trying to
> estimate an  effect, but using different ways). Additionally, there is no
> limitation about on what condition we can  use a variable as fixed factor
> during regression (you can apply any variable as an fixed effect if you
> hypothesie the effect, no non-independence requirements). Why do we need
> non-independence condition for the random factors?
> Thanks
> Regards,
> Chun
> -----Original Message-----
> From: Ben Bolker [mailto:bbolker at gmail.com]
> Sent: maandag, september 05, 2016 20:51
> To: Chen, Chun
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Does the “non-independent" data structure defined
> in mixed models follow the “independency” defined by probability theory?
> On Mon, Sep 5, 2016 at 4:08 AM, Chen, Chun <chun.chen at wur.nl> wrote:
> > Dear all,
> >
> > I am bit puzzled by definition of the “nested data” or “non-independent
> data” structure in the mixed model.
> >
> > >From the statistical point of view, independency is defined as the
> probabilities of selecting two observations are not influencing each other.
> In this case, if I design an experiment where I on purposely select two
> observations from the same group (or strata), then later on we can say
> these two observations are dependent. However, if I am doing a sampling
> with replacement and by coincidence I selected one observations twice (e.g.
> throw a dice twice and by coincidence we get both a “6” each time). The
> probability of selecting these two observations are indeed not influencing
> each other and they are independent.
> >
> > My questions are:
> >
> > What’s the definition of the “non-independent data” that is often
> > referred in mixed modeling? Is it the same concept as “independency”
> > defined by probability theory, which is relevant by how the
> > observations are selected, rather than how the observations look alike
> > in the final sample
>    (You say "questions" here, but there really seems to be only one
> question here.)
>   Yes, mixed modeling defines grouping variables based on
> experimental/observational design.  That is, grouping variables are
> identifiers that are believed *a priori* to be associated with
> non-independence of observations with the same identifier values.
>   Ben Bolker
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



John Poe
Doctoral Candidate
Department of Political Science
Research Methodologist
UK Center for Public Health Services & Systems Research
University of Kentucky
111 Washington Avenue, Room 203a
Lexington, KY 40536

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list