[R-sig-ME] Correlation of random effects in lmer

R. Scott Davidson rsdavidson99 at gmail.com
Sun Apr 29 01:27:01 CEST 2012


My understanding was that because of the issues with counting degrees of 
freedom if the focus was on the population level then it was better to 
identify the most appropriate random effect model (given structure of 
the data, etc) and then to use that for the model selection while 
varying the fixed effects (the final paragraph in your initial response).

I guess I would ask, in a wider context, if you're looking at 
identifying the best predictive model for the population is it 
reasonable to use AIC (or BIC) to select between models with different 
random effects, or is it better to select one random effect model and to 
then do model selection using that?

Cheers
Scott

On 4/29/2012 02:29, Ben Bolker wrote:
> On 12-04-27 11:11 PM, R. Scott Davidson wrote:
>> I don't have an answer to the question posed but isn't there a problem
>> with using either AIC or BIC to select between models with different
>> random effects?  The -2lnl for both models is virtually the same so any
>> difference in scores will be due to the penalty term for the number of
>> parameters (esp as the fixed parameters are the same).  If you're going
>> to use either AIC or BIC to select between the two models with different
>> random effect structures shouldn't the "actual" number of parameters for
>> the random effects be estimated as described in the Vaida and Blanchard
>> 2005 paper?  I guess it will come down to how the number of parameters
>> for the random effects is counted?
>>
>> Cheers
>> Scott
>    [taking the liberty of cc'ing this back to the list]
>
>    Yes, fair enough.  This is going to be pretty slippery in any case.
>
>    One thing to keep in mind is that the Vaida and Blanchard
> conditional-AIC approach is appropriate when you want to maximize
> predictive accuracy at the *individual* (subject) level; if you want to
> maximize predictive accuracy at the population level it is generally
> more appropriate to count parameters in a more naive way, essentially
> counting the number of 'theta' (variance and covariance) parameters. See
> "How do I count the number of degrees of freedom for a random effect?"
> in http://glmm.wikidot.com/faq ...
>
>
>>
>> On 4/28/2012 14:39, Ben Bolker wrote:
>>> arun<smartpink111 at ...>   writes:
>>>
>>>> I am testing a couple of logistic regression longitudinal models
>>>> using lmer.  I got stuck in a position where I found that the
>>>> correlation between random effects is 1.00 (intercept and slope
>>>> -model with one term for the same grouping factor), while the
>>>> std. deviation is very low for the slope.  Then, I tested another
>>>> model with more than one term for the same grouping factor.  The LRT
>>>> test p value is not significant.  Is it okay to keep the second
>>>> model in this case?
>>>     I'm not sure I agree with your terminology here -- there are really
>>> two terms in both the (1+time|subject) model and the (1|subject) +
>>> (0+time|subject) model, they are just forced to be independent in the
>>> second case while the first case allows them to be correlated.
>>>
>>>> Random effects:
>>>>    Groups  Name        Variance   Std.Dev. Corr    resid   (Intercept)
>>>> 10.9575146 3.310214         Subject (Intercept)  0.8220584
>>>> 0.906674                 time         0.0041092 0.064103 1.000 Number
>>>> of obs: 392, groups: resid, 392; Subject, 25
>>>>
>>>>> anova(fm2_BDat,fm1_BDat)
>>>> Data: Behavdat
>>>> Models:
>>>> fm2_BDat: Response3 ~ 1 + Wavelength * Start_Resp * time + (1 |
>>>> resid) + fm2_BDat:     (1 | Subject_BDat) + (0 + time | Subject_BDat)
>>>> fm1_BDat: Response3 ~ 1 + Wavelength * Start_Resp * time + (1 |
>>>> resid) + fm1_BDat:     (1 + time | Subject_BDat)
>>>>            Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
>>>> fm2_BDat 15 754.30 813.87 -362.15                         fm1_BDat 16
>>>> 754.24 817.78 -361.12 2.0564      1     0.1516
>>>     This is a very common situation, and a bit of a tough call (I
>>> haven't seen much in the way of formal, rigorously tested guidance for
>>> this situation.)  What glmer is telling you is essentially that you
>>> really don't have enough data to fit two separate random effects
>>> ((intercept):Subject and time:Subject), so the fit is collapsing
>>> onto a linear combination of the two.  Your second model (fm2_BDat)
>>> is enforcing a zero correlation between the two random effects: I would
>>> not be surprised if the variance of one of the two were very close
>>> to zero (although you haven't shown us that).
>>>
>>>     I will give several (conflicting) arguments here, I would be
>>> interested to hear what others have to say:
>>>
>>>     1. You have no _a priori_ way of knowing that the correlation
>>> *is* exactly zero (if you had enough data, you would almost certainly
>>> find that there was a non-zero correlation between these terms);
>>> you shouldn't be doing 'sacrificial' or stepwise removal of terms.
>>> Keep the more complex model.
>>>
>>>     2. You should be trying to select the 'minimal adequate' model; in
>>> particular, overfitting the model (including zero and/or perfectly
>>> correlated terms) is more likely to lead to numeric problems, so it's
>>> better to try to reduce the model until all the terms can be uniquely
>>> estimated.  (This is the approach suggested in Bolker et al 2008
>>> _Trends in Ecology and Evolution_)
>>>
>>>    3.  AIC (which is looking for the best *predictive* model) very
>>> slightly favors the model with correlation, although it's almost a
>>> wash; if you were finding model-averaged predictions, they would be
>>> nearly a 50/50 mixture of the predictions from the two models.
>>>
>>>    4. BIC (which is looking for the "true" model, i.e. identifying the
>>> correct dimensionality) favors the simpler (no-correlation) model.
>>>
>>>    5. If you are most interested in inference on the fixed-effect terms,
>>> I doubt it matters -- my guess is that these two models give nearly
>>> identical fixed-effect estimates.
>>>
>>>     As long as you decide what to do in a principled way (which
>>> approach seems to best fit the analysis you want to do?) rather
>>> than by selecting the one that gives you the answers you like
>>> best, I think either model is defensible.
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list