[R-sig-ME] Non-constancy of variances in mixed model.
john.maindonald at anu.edu.au
Mon Mar 14 07:26:05 CET 2011
One does not expect the dependent variable to be normal, and certainly not
independently and identically distributed. Rather, the model is
constructed using a kitset in which the components are
1) 'fixed' effects (in matrix terminology, the X beta part of the model)
2) normal distributions (the random effects, making up the Z b part of the model)
3) an exponential family distribution, in your case normal
(the e part of the model)
Adding up 1), 2) and 3) to give y = X beta + Z b + e is very unlikely to give
anything that looks like a normal distribution. So your test for normality is
misdirected. It probably is worth using normal probability plots for a rough
check that the residuals are are consistent with a normal distribution, and
that the random effects at the various levels (disks, etc) are pretty much
consistent with a normal distribution.
You have to extract the relevant quantities from the fitted model object.
For reasons that have been worked over on this list, extensively, it would
be pointless to use tests to check for normal distributions, even if suitable
tests were available. (I do not believe that in general there are good tests.)
If your design is indeed hierarchical and balanced, I think you might find
it simpler to stick with anova, with an appropriate Error argument. You
have as much (or as little!) licence to ignore the degrees of freedom in
anova as you have not to worry about degrees of freedom for t-statistics
from your lmer calculations. Degrees of freedom are part of the deal if
you want p-values without using the mcmcsamp() approach, or maybe
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 14/03/2011, at 2:24 PM, Ben Ward wrote:
> Hi, I've been doing an experiment, measuring the dead-zone-diameters of bacteria, when they've been grown with paper diffusion disks of antimicrobial. There are two groups, or treatments - one is bacteria that have been cultured in said antimicrobial for the past year, the other group is of the same species, but lab stock and has not gone had any prior contact with the antimicrobial. Because I take four readings from each disk, and there are three disks in a petri dish, and three petri dishes for each lineage (or replicate), and there are 5 lineages for each group/treatment, I want to use lme() so I can add lineage, dish, and disk as random effects, and have group as the fixed effect, so I'm analysing differences between two groups, a bit like a t.test or anova, but without riddiculous degrees of freedom. I've done qqPlots() and normality tests on my data, according to shapiro.test() on both my subsets (I split the data into the two groups):
> Shapiro-Wilk normality test
> data: DiamEVDettol
> W = 0.9823, p-value = 0.0221
> Shapiro-Wilk normality test
> data: DiamNEDettol
> W = 0.9788, p-value = 0.007677
>> qqPlot(DiamEVDettol, dist="norm")
>> qqPlot(DiamNEDettol, dist="norm")
> Which from the R-Book shows non-normality. qqPlot() I did show a slight S shape, and then slightly more profound s-shape respectively suggesting, lepikurtosis. Skewness and Kurtosis are:
>  -0.1917142
>  -0.7594319
>  0.1314403
>  -0.8365288
> (I'm not sure how to get p-values to see if they're significant or not) .
> But I'm unsure on how to proceed, I had done:
> allrandom <- lme(Diameter~1,random=~1|Group/Lineage/Dish/Disk,data=Dataset)
> So as I could do a Variance components analysis:
> Variance StdDev
> Group = pdLogChol(1)
> (Intercept) 1.8773750 1.3701734
> Lineage = pdLogChol(1)
> (Intercept) 0.2648475 0.5146333
> Dish = pdLogChol(1)
> (Intercept) 0.0601047 0.2451626
> Disk = pdLogChol(1)
> (Intercept) 0.1456451 0.3816348
> Residual 1.3456346 1.1600149
>  52.914183 7.464779 1.694063 37.926975
> Before moving on to:
> groupfixed <-lme(Diameter~Group,random=~1|Lineage/Dish/Disk,data=Dataset)
> Because Group is what I'm interested in, and is the only one with informative factor levels, other than maybe Lineage. I was going to gon on and construct similar models with different interactions or terms, but this issue of skewness and non normality stopped me, at first boxplots show what, in my only 3 years undergraduate experience looked like normal data - indeed neater data than I'm used to in biological science experiments. However this furthur probing o skewness and kurtosis and qqplots and shapiro.tests reveals otherwise. The variance is also not the same between the two groups (this was anticipated, because of natural selection of the bacteria in one of the groups, that variance could be less).
> I'm concerned as to what I should do about the data I have, in terms of transformations, or whether there's a way for mixed models to take into account different distributions of data like a glm so my data does not have to be strictly normal or have equal variance.
> Another concern of mine is whether I should be using lme as above, or as a book I read states, re-coding factor levels, and using lmer, for example:
> so with me it might be:
> I'm not clear on what the difference would be here.
> Ben W.
> R-sig-mixed-models at r-project.org mailing list
More information about the R-sig-mixed-models