[R-sig-ME] Assessing Normality for Mixed Models

Emmanuel Curis emmanuel.curis at parisdescartes.fr
Tue May 20 22:19:00 CEST 2014


Hi All,

I give here a tentative response, and would be happy to be corrected
by more experimented statisticians.

When using the Y = X %*% theta + X' %*% Z + epsilon model, where Z is
the random effects matrix and epsilon the residuals, the aim is, with
the random part, to reproduce the covariance matrix of Y starting with
Z and epsilon.

A first assumption is that Z and epsilon are (multivariate) normally
distributed. A second assumption is that Z and epsilon are independant.

However, the decomposition of Y covariance matrix between X' %*% Z and
epsilon can be done in several ways, ranging from X' %*% Z = 0 and
everything is in the epsilon part --- this is the gls approach in
nlme, if I understood it correctly ---, to epsilon is iid (as in usual
linear model) and X' %*% Z explains everything else. And everything
inbetween can be wrote, with some classical kinds of matrices
(« compound symmetry » and so on). This corresponds to the G and R
parts of the (random effects) model using SAS vocabulary [not quite
sure of the two letters right now, sorry]. Hence, other assumptions
and what exactly mean « normally distributed » will depend on the
exact model you write.

In lme4, the model is that epsilon is iid, and everything is in the
X'%*%Z part, which is a one-dimension Gaussian in the simplest case of
a single random effect (and no interaction with fixed effects either),
and multidimensionnal Gaussian otherwise, with a covariance matrix
depending on the exact model formula for the random part.

In (n)lme, you can have more flexibility on both random effects and
residual parts, so more combinations are possible.

So, with lme4, checks would be
 - residuals are normal
     ==> qqplot of residuals
 - residuals are independant
     ==> autocorrelation or (ei, ei+1) plots

 - residuals are homoscedastic
     ==> residuals = f( fixed effects) plots

        (keeping in mind that observed residuals are not independant,
        nor normally distributed)

   here, the situation is very similar to the linear model case,
   except that I'm not sure that (externally) studentized residuals
   follow a known distribution to improve check quality...

 - random effects are _multi-dimensionnal_ normals
     which is more difficult to assess ; marginal normality of each
     column of the matrix of random effects is necessary but not
     sufficient.

   here also, observed randoms effects are not normally distributed,
   and not independant, which means one cannot be too strict when
   graphically assessing normality

 - random effects and residuals are independant
     not quite sure, but I guess a ranef() ~ residuals() plot should
     give a hint?

For nlme, since both epsilon and random effects can be
multi-dimensionnal, for both of them we are in the second case.

The frontier between model assumption and model building is not so
clear either, at least for me: if you assume homoscedasticity of
random effects between groups, it is a model assumption to be checked,
but it can be checked by comparing it to an heteroscedastic model...

Hope this help,

PS : I've found a lot of things about « sphericity » assumptions and
things like that, but I'm not sure how general it is, or only to be
considered when trying to build F-tests, my impression is that it is
only for F-tests.

PS-2 : I am currently working on scripts that try to help checking
such assumptions, both for LM and LMM, leading to an HTML report for
the user; if anyone wants to help me in that development or testing,
please feel free to contact me.

On Tue, May 20, 2014 at 02:59:28PM -0400, AvianResearchDivision wrote:
« Hi All,
« 
« After doing some extensive googling, searching for ways to assess normality
« for linear mixed models, I can honestly say my head is swimming in
« different proposed techniques that may or may not be valid.  Also, when
« reading the literature, I find that few studies that use linear mixed
« models and random regression actually explicitly address how they assess
« normality.  What are the rules with normality with mixed models (if there
« are any) and what are your techniques to assess normality?  Any input that
« you can provide would be great and hopefully we help to settle my mind on
« this issue.
« 
« Thank you,
« Jacob
« 
« 	[[alternative HTML version deleted]]
« 
« _______________________________________________
« R-sig-mixed-models at r-project.org mailing list
« https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html



More information about the R-sig-mixed-models mailing list