[R-sig-ME] Correlation of random effects in lmer

Sat Apr 28 16:29:59 CEST 2012

On 12-04-27 11:11 PM, R. Scott Davidson wrote:
> I don't have an answer to the question posed but isn't there a problem
> with using either AIC or BIC to select between models with different
> random effects?  The -2lnl for both models is virtually the same so any
> difference in scores will be due to the penalty term for the number of
> parameters (esp as the fixed parameters are the same).  If you're going
> to use either AIC or BIC to select between the two models with different
> random effect structures shouldn't the "actual" number of parameters for
> the random effects be estimated as described in the Vaida and Blanchard
> 2005 paper?  I guess it will come down to how the number of parameters
> for the random effects is counted?
> 
> Cheers
> Scott

  [taking the liberty of cc'ing this back to the list]

  Yes, fair enough.  This is going to be pretty slippery in any case.

  One thing to keep in mind is that the Vaida and Blanchard
conditional-AIC approach is appropriate when you want to maximize
predictive accuracy at the *individual* (subject) level; if you want to
maximize predictive accuracy at the population level it is generally
more appropriate to count parameters in a more naive way, essentially
counting the number of 'theta' (variance and covariance) parameters. See
"How do I count the number of degrees of freedom for a random effect?"
in http://glmm.wikidot.com/faq ...

> 
> 
> On 4/28/2012 14:39, Ben Bolker wrote:
>> arun<smartpink111 at ...>  writes:
>>
>>> I am testing a couple of logistic regression longitudinal models
>>> using lmer.  I got stuck in a position where I found that the
>>> correlation between random effects is 1.00 (intercept and slope
>>> -model with one term for the same grouping factor), while the
>>> std. deviation is very low for the slope.  Then, I tested another
>>> model with more than one term for the same grouping factor.  The LRT
>>> test p value is not significant.  Is it okay to keep the second
>>> model in this case?
>>    I'm not sure I agree with your terminology here -- there are really
>> two terms in both the (1+time|subject) model and the (1|subject) +
>> (0+time|subject) model, they are just forced to be independent in the
>> second case while the first case allows them to be correlated.
>>
>>> Random effects:
>>>   Groups  Name        Variance   Std.Dev. Corr    resid   (Intercept)
>>> 10.9575146 3.310214         Subject (Intercept)  0.8220584
>>> 0.906674                 time         0.0041092 0.064103 1.000 Number
>>> of obs: 392, groups: resid, 392; Subject, 25
>>>
>>>> anova(fm2_BDat,fm1_BDat)
>>> Data: Behavdat
>>> Models:
>>> fm2_BDat: Response3 ~ 1 + Wavelength * Start_Resp * time + (1 |
>>> resid) + fm2_BDat:     (1 | Subject_BDat) + (0 + time | Subject_BDat)
>>> fm1_BDat: Response3 ~ 1 + Wavelength * Start_Resp * time + (1 |
>>> resid) + fm1_BDat:     (1 + time | Subject_BDat)
>>>           Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
>>> fm2_BDat 15 754.30 813.87 -362.15                         fm1_BDat 16
>>> 754.24 817.78 -361.12 2.0564      1     0.1516
>>    This is a very common situation, and a bit of a tough call (I
>> haven't seen much in the way of formal, rigorously tested guidance for
>> this situation.)  What glmer is telling you is essentially that you
>> really don't have enough data to fit two separate random effects
>> ((intercept):Subject and time:Subject), so the fit is collapsing
>> onto a linear combination of the two.  Your second model (fm2_BDat)
>> is enforcing a zero correlation between the two random effects: I would
>> not be surprised if the variance of one of the two were very close
>> to zero (although you haven't shown us that).
>>
>>    I will give several (conflicting) arguments here, I would be
>> interested to hear what others have to say:
>>
>>    1. You have no _a priori_ way of knowing that the correlation
>> *is* exactly zero (if you had enough data, you would almost certainly
>> find that there was a non-zero correlation between these terms);
>> you shouldn't be doing 'sacrificial' or stepwise removal of terms.
>> Keep the more complex model.
>>
>>    2. You should be trying to select the 'minimal adequate' model; in
>> particular, overfitting the model (including zero and/or perfectly
>> correlated terms) is more likely to lead to numeric problems, so it's
>> better to try to reduce the model until all the terms can be uniquely
>> estimated.  (This is the approach suggested in Bolker et al 2008
>> _Trends in Ecology and Evolution_)
>>
>>   3.  AIC (which is looking for the best *predictive* model) very
>> slightly favors the model with correlation, although it's almost a
>> wash; if you were finding model-averaged predictions, they would be
>> nearly a 50/50 mixture of the predictions from the two models.
>>
>>   4. BIC (which is looking for the "true" model, i.e. identifying the
>> correct dimensionality) favors the simpler (no-correlation) model.
>>
>>   5. If you are most interested in inference on the fixed-effect terms,
>> I doubt it matters -- my guess is that these two models give nearly
>> identical fixed-effect estimates.
>>
>>    As long as you decide what to do in a principled way (which
>> approach seems to best fit the analysis you want to do?) rather
>> than by selecting the one that gives you the answers you like
>> best, I think either model is defensible.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>