[R-sig-ME] Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?

Tue Sep 6 09:41:56 CEST 2016

Thank you Ben for the answer. Now I am wondering:

1) If I happened to have a grouping variable that is not by design, for instance my randomly selected observations turned out to show some site related characteristics, is it sound to apply a mixed model including site as random intercept? In practice, it is pretty common to use site as a fixed effect in the regression analysis (i.e. to detect the main effect after adjusting site effect), even site is not a factor in the experimental/observational design.

2) If site can be used as a random intercept, what is the exact criteria for non-independence (i.e. nested structure ) in the context of applying a mixed model? Not the same as what you defined below?

3) In case site can not be used as random intercept, but can be used as a fixed factor: I assume that if a categorical variable can be modeled as a fixed effect, it can also be modeled as random effect (both are trying to estimate an  effect, but using different ways). Additionally, there is no limitation about on what condition we can  use a variable as fixed factor during regression (you can apply any variable as an fixed effect if you hypothesie the effect, no non-independence requirements). Why do we need non-independence condition for the random factors?

Thanks

Regards,
Chun

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com] 
Sent: maandag, september 05, 2016 20:51
To: Chen, Chun
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?

On Mon, Sep 5, 2016 at 4:08 AM, Chen, Chun <chun.chen at wur.nl> wrote:
> Dear all,
>
> I am bit puzzled by definition of the “nested data” or “non-independent data” structure in the mixed model.
>
> >From the statistical point of view, independency is defined as the probabilities of selecting two observations are not influencing each other. In this case, if I design an experiment where I on purposely select two observations from the same group (or strata), then later on we can say these two observations are dependent. However, if I am doing a sampling with replacement and by coincidence I selected one observations twice (e.g. throw a dice twice and by coincidence we get both a “6” each time). The probability of selecting these two observations are indeed not influencing each other and they are independent.
>
> My questions are:
>
> What’s the definition of the “non-independent data” that is often 
> referred in mixed modeling? Is it the same concept as “independency” 
> defined by probability theory, which is relevant by how the 
> observations are selected, rather than how the observations look alike 
> in the final sample

   (You say "questions" here, but there really seems to be only one question here.)

  Yes, mixed modeling defines grouping variables based on experimental/observational design.  That is, grouping variables are identifiers that are believed *a priori* to be associated with non-independence of observations with the same identifier values.

  Ben Bolker