[R-sig-ME] Heteroscedasticity, lme4 and nlme

Wed Jun 12 21:54:36 CEST 2013

Jennifer Bufford <jbufford at ...> writes:

>

[snip]

> I am working on my dissertation and using mixed models for quite a bit of
> my analysis. Here are some details about my experiment and data:

> I grew seedlings of 21 species in two sites, 10 blocks per site, and 2
> plots per block either with or without competition ("Weeds").  I am also
> including phylogenetic family (3 families) and invasiveness (3
> categories).  I have site and family as fixed effects because I understand
> there aren't enough categories to reliably estimate them as random
> effects.  I have several potential continuous covariates as well.  My model
> includes all possible two-way interactions, fixed and random, and I am
> reducing the model using the likelihood ratio test.  My goal in reducing
> the model is to improve clarity and avoid over-fitting.  I have both
> continuous (transformed) and binomial response variables.
> 
> Here's a full (unreduced) model without covariates:
> RGR.simple <- lmer(RGR ~ Inv + Weeds + Site + Fam + Inv:Weeds + Inv:Site +
> Weeds:Site + Weeds:Fam + Site:Fam + (1|Blk) + (1|Plot) + (1|Sp) + (0 +
> Site|Sp) + (0+Weeds|Sp), fdatRGR, family="gaussian")

  You don't need family="gaussian" here, and you might be able to
write the fixed-effect part of your formula more compactly as

RGR ~ (Inv + Weeds + Site + Fam)^2 + ... 

 I assume that the plots are uniquely labeled.

> My data are extremely unbalanced. I have heteroscedasticity in the
> residuals of my full and reduced models, as seen both by plotting and using
> leveneTest (package car).

  That's unfortunate.  I know your data are already growth relative
to size, but does log-transforming your data help?

> I have several questions, which center largely around the issue of
> heteroscedasticity:
> 
> I have both crossed random effects and serious heteroscedasticity.  I'd
> like to implement a non-homogeneous variance structure, but as far as I can
> tell, that can only be done in lme (from nlme), not in lmer (from lme4).
> Is that right?  I have heard that there are some work-arounds to get
> crossed random effects in lme.  

  Yes, see pp 163ff of Pinheiro and Bates 2000 (this is referenced
in http://glmm.wikidot.com/faq)

> Would that be the best solution in this
> situation?  If I specified an alternate variance structure by species,
> should residuals by species be homogenous (ie non-significant Levene's
> Test) or not?

  I don't think they would need to be

> I've also gotten a bit confused about random interactions.  lme4 allows
> much more complex random effects - would that in some way help account for
> heteroscedasticity?  If so, it hasn't worked for my data, but I want to
> understand what the random effects are doing.  In other words, what is the
> difference between specifying a variance structure (e.g. varIDent(form =
> ~1|Species) and including vector-valued random effects (e.g.
> (Competition|Species))?  My model statistics (ie LRT and AIC) tell me that
> many of the models are better with these complex random effects, but that
> also restricts my movement from lme4 to nlme.

  varIdent(form=~1|Species) says that the residual variance differs
among species.

  (Competition|Species) says that the effect of competition varies
among species.

> There is a weights function in lmer, but I know it is not the same as the
> weights function in lme.  If I use the weights term in lmer to specify
> weighting by the scaled reciprocal of species variance, that would give
> species with smaller variances a greater "say" in the model - is that
> right?  Weighting this way does fix my heteroscedasticity problems, but I'm
> not sure if it is a valid approach.

  I'm not sure if this works: also, there's likely to be some
correlation between the different parts of the variance structure
(e.g. between estimates of differences in residual variances among species
and estimates of the random-effects variances)

  Everything you've said so far seems reasonable, but I'm still a bit
worried that you're working with too complex a model.  Can you think
of plausible ways to simplify that you would be comfortable with?

  Another option, if you really want to do complex variance models
like this, is to build your own in WinBUGS/AD Model Builder -- then
you really know exactly what's going on.

   If you can, it would be worth simulating some data that are
as complex as the models you're trying to fit, and see if you can
get reasonable answers ...