[R] normal distribution assumption for multi-level modelling

Wed Apr 18 20:01:55 CEST 2012

Cecile De Cat <c.decat <at> leeds.ac.uk> writes:

> I'm analysing reaction time data from a linguistic experiment (a variant of
> a lexical decision task).   To ascertain that the data was normally
> distributed, I used *shapiro.test *for each participant (see commands
> below), but only one out of 21 returns a p value above p.0 05.
> 
> > f = function(dfr) return(shapiro.test(dfr$Target.RTinv)$p.value)
> > p = as.vector(by(newdat, newdat$Subject, f))
> > names(p) = levels(newdat$Subject)
> > names(p[p < 0.05])
> 
> Removing a few outliers per subject doesn't make a difference, and
> "aggressive" removal of outliers (done by subject, for each of the 6
> conditions ) still results in non-normally distributed data by subject.
> 
> Does this invalidate any attempt at multi-level modelling?

  I don't think so.

  1. You should be concerned about the normality the *residuals* of
your response variable, i.e. the conditional distribution of your data
(or if you only have categorical predictors you could equivalently
look *within* the smallest sampling unit where you expect a constant
mean), not the marginal distribution of the data.

  2. Many statisticians would say you shouldn't be doing hypothesis
tests of normality for this purpose in any case; if you have little
data the tests have low power (so you won't detect non-normal data),
while if you have a great deal the tests can be *too* powerful
(i.e. you detect significant deviations of normality which do not
actually compromise the inferences you would be making from your
analysis).  I don't have a great citation for this handy, but
one is listed below (Cherry 1998).

  3. You're not applying any multiple-comparisons correction, so
getting 1/20 (let alone out of 1/21) p values <0.05 is exactly
as expected if the null hypothesis were true.

  Follow-ups to r-sig-mixed-models <at> r-project.org, although
this issue (hypothesis testing as a way to validate the statistical
assumptions of a model) is not specific to mixed models.

@article{cherry_statistical_1998,
	title = {Statistical Tests in Publications of The Wildlife Society},
	volume = {26},
	issn = {0091-7648},
	url = {http://www.jstor.org/stable/3783574},
	number = {4},
	journal = {Wildlife Society Bulletin},
	author = {Cherry, Steve},
	month = dec,
	year = {1998},
	pages = {947--953}
}