[R] normal distribution assumption for multi-level modelling
Ben Bolker
bbolker at gmail.com
Wed Apr 18 20:01:55 CEST 2012
Cecile De Cat <c.decat <at> leeds.ac.uk> writes:
> I'm analysing reaction time data from a linguistic experiment (a variant of
> a lexical decision task). To ascertain that the data was normally
> distributed, I used *shapiro.test *for each participant (see commands
> below), but only one out of 21 returns a p value above p.0 05.
>
> > f = function(dfr) return(shapiro.test(dfr$Target.RTinv)$p.value)
> > p = as.vector(by(newdat, newdat$Subject, f))
> > names(p) = levels(newdat$Subject)
> > names(p[p < 0.05])
>
> Removing a few outliers per subject doesn't make a difference, and
> "aggressive" removal of outliers (done by subject, for each of the 6
> conditions ) still results in non-normally distributed data by subject.
>
> Does this invalidate any attempt at multi-level modelling?
I don't think so.
1. You should be concerned about the normality the *residuals* of
your response variable, i.e. the conditional distribution of your data
(or if you only have categorical predictors you could equivalently
look *within* the smallest sampling unit where you expect a constant
mean), not the marginal distribution of the data.
2. Many statisticians would say you shouldn't be doing hypothesis
tests of normality for this purpose in any case; if you have little
data the tests have low power (so you won't detect non-normal data),
while if you have a great deal the tests can be *too* powerful
(i.e. you detect significant deviations of normality which do not
actually compromise the inferences you would be making from your
analysis). I don't have a great citation for this handy, but
one is listed below (Cherry 1998).
3. You're not applying any multiple-comparisons correction, so
getting 1/20 (let alone out of 1/21) p values <0.05 is exactly
as expected if the null hypothesis were true.
Follow-ups to r-sig-mixed-models <at> r-project.org, although
this issue (hypothesis testing as a way to validate the statistical
assumptions of a model) is not specific to mixed models.
@article{cherry_statistical_1998,
title = {Statistical Tests in Publications of The Wildlife Society},
volume = {26},
issn = {0091-7648},
url = {http://www.jstor.org/stable/3783574},
number = {4},
journal = {Wildlife Society Bulletin},
author = {Cherry, Steve},
month = dec,
year = {1998},
pages = {947--953}
}
More information about the R-help
mailing list