[R-sig-ME] Guidelines for choosing family (and link) in lmer

John Maindonald john.maindonald at anu.edu.au
Sun Mar 6 23:04:51 CET 2011

The general rules that Iker gives are good starting points for thinking about the modeling.

As for testing for normality, one does not expect the dependent variable to be normal,
and certainly not independently and identically distributed.  Rather, the model is
constructed using a kitset in which the components are 
1) 'fixed' effects (the X beta part of the model)
2)  normal distributions (the random effects, making up the Z b part of the model)
3) an exponential family distribution,  if not normal then generally binomial or Poisson. 
(the e part of the model)

Adding up 1), 2) and 3) is very unlikely to give anything that looks like a normal or other
exponential family distribution.

Except in suitably balanced models the computations that recover the estimates of the 
normal random effects are likely to distort the distribution.  Hence direct checks on the
normality of the random effects can be misleading.

A reasonable tack may be to use quantile-quantile plots to compare the estimated 
random effects from the fitted model with the estimated random effects from a simulation 
of the model, doing this several times for each set of random effects.  (The simulation 
generates data that on average does accord with the normality and other assumptions; 
does the data that is to be analysed produce something comparable?)

It gets even more complicated.  What one needs is approximate normality (or . . .).
For purposes of getting SEs of parameter estimates, it is the normality of the relevant
sampling distribution that matters. 

A further points. Tests for normality are in general useless.  They most readily detect 
non-normality in those large-sample contexts where (because of central limit theorem 
effects) normality matters, for many of the uses of model results, least.

This is an area that needs a lot more work.  Currently most of will, most of the time,
proceed more in hope of something acceptably close to normality (or . . .) than in
certain assurance.  Not that certain assurance will ever be available!

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 07/03/2011, at 8:04 AM, Iker Vaquero Alba wrote:

>    Hello, Sverre:
>    I would like to add something to your question. I have heard that one of the best features about glmm-s is that they can deal extremely well with issues such as non-normality of the data, indeed much better than any other statistical packages. If that's true and trying to translate it into a more pragmatic rule, I would say that, as long as your response variable is continuous, you can, as a general rule, choose "family=gaussian" and lmer will deal quite well with the data as far as the distribution is not extremely skewed or strongly divergent from a normal one. And further to that, binomial for proportion data and Poisson for count data, as a general rule, of course, but with many other options and possibilities more adapted to more specific purposes. 
>    I know it seems like a sort of answer, but I'm actually asking if this point of view is correct, as an add-on to Sverre's question.
>    Thank you very much!
>    Iker   
> --- El dom, 6/3/11, Sverre Stausland <johnsen at fas.harvard.edu> escribió:
> De: Sverre Stausland <johnsen at fas.harvard.edu>
> Asunto: [R-sig-ME] Guidelines for choosing family (and link) in lmer
> Para: r-sig-mixed-models at r-project.org
> Fecha: domingo, 6 de marzo, 2011 21:41
> Hi all,
> I was wondering whether there are any general guidelines out there
> (online or in the literature) explaining how to choose the appropriate
> call for "family" in the lmer function?
> Right now, I have a data set where the dependent variable "looks
> normal", but where shapiro.test and ks.test tell me it's not. It's not
> clear to me whether I should still use "family=gaussian" or use
> something else. (I'm not giving details here, because I am looking for
> general guidelines).
> Best
> Sverre
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

More information about the R-sig-mixed-models mailing list