[R-sig-ME] Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?

Tue Sep 13 14:28:00 CEST 2016

Thank you François, it is much clear for me now.

-----Original Message-----
From: François Rousset [mailto:francois.rousset at umontpellier.fr] 
Sent: woensdag, september 07, 2016 14:28
To: Chen, Chun; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Does the “non-independent" data structure defined in mixed models follow the “independency” defined by probability theory?

Dear all,

I am a bit surprized by the previous follow-up to this question:

Le 05/09/2016 à 10:08, Chen, Chun a écrit :
> Dear all,
>
> I am bit puzzled by definition of the  nested data  or  non-independent data  structure in the mixed model.
>
> >From the statistical point of view, independency is defined as the probabilities of selecting two observations are not influencing each other. In this case, if I design an experiment where I on purposely select two observations from the same group (or strata), then later on we can say these two observations are dependent. However, if I am doing a sampling with replacement and by coincidence I selected one observations twice (e.g. throw a dice twice and by coincidence we get both a  6  each time). The probability of selecting these two observations are indeed not influencing each other and they are independent.
the assumptions are those of the model being fitted. Thus is a mixed model one typically assumes that the _residual errors_ for each observation are independent.
The "observations" are not independent, but the model does not assume that the "observations" are independent in any elevant stochastic sense, only that the residuals are.

I don't see independence in probability as being defined as "not influencing each other". In practice independence also means "not affected by a common factor". A formal definition of independence of several events  is that the joint probability of these events is the product of probabilities of each event : see e.g. Feller 1950, p.125. In the above example of sampling with replacement, a single draw of a residual error affects two response values, so according to the formal definition, the two residuals are not independent. So sampling with replacement violates the assumptions of independence of residuals.

F.R.
>
> My questions are:
>
> What s the definition of the  non-independent data  that is often 
> referred in mixed modeling? Is it the same concept as  independency  
> defined by probability theory, which is relevant by how the 
> observations are selected, rather than how the observations look alike 
> in the final sample
>
> Thanks
>
> Regards,
> Chun
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models