[R-sig-ME] dramatic difference in REML and ML random effect estimates

Thu Apr 14 16:38:22 CEST 2011

On Thu, Apr 14, 2011 at 8:17 AM, Ben Bolker <bbolker at gmail.com> wrote:
>  Since no-one's answered this yet:
>
> On 04/13/2011 04:21 AM, Simon Chamaillé-Jammes wrote:
>> Hello all,
>>
>> I'm comparing the duration (duration) of two different behaviour
>> (behaviourtype) for 10 individuals (indid) in two different years
>> (year). One of the behaviour was not observed for some individuals in
>> the first year, so the indid and year are only partially crossed (see
>> end of mail for data distribution). I'm fitting the model:
>>
>> lmer(duration~behaviourtype+(1|indid)+(1|year),REML=T,data=mydata)
>>
>> I have only two levels for year so modeling it as a random effect may
>> not be that important (suitable?).
>
>  Short answer: modeling a factor with only two levels as a random
> effect is questionable/unsuitable -- this also explains why the REML and
> ML estimates of the variances are so different.  In general, the
> differences between REML and ML estimates are on the order of (n/(n-1)).
>
>  I'm mildly surprised that the individual-level variance drops all the
> way to (effectively) zero (the naive expectation would be that would
> "only" be cut in half), but in the grand scheme of things this is not an
> unusual result (I think).

Note that the difference in deviance between the two sets of estimates
is very small, indicating the the estimate of the standard deviation
of the random effects is very poorly determined.

>  My guess would be that if you re-fit with year as a fixed effect, you
> will get results closer to the REML fit.  On the other hand, it doesn't
> look like there's that much going on with behaviour type anyway ... (the
> estimates are quite different 0.71 vs 1.59, but the differences are well
> within the standard errors of either estimate).
>
>> The model summary is this:
>>
>> Linear mixed model fit by REML
>> Formula: duration ~ behaviourtype + (1 | year) + (1 | indid)
>>     Data: mydata
>>    AIC  BIC logLik deviance REMLdev
>>   1708 1724 -848.8     1706    1698
>> Random effects:
>>   Groups   Name        Variance Std.Dev.
>>   indid    (Intercept)  13.543   3.6800
>>   year     (Intercept)  17.683   4.2051
>>   Residual             410.779  20.2677
>> Number of obs: 192, groups: indid, 10; year, 2
>>
>> Fixed effects:
>>                        Estimate Std. Error t value
>> (Intercept)            32.8539     3.9592   8.298
>> behaviourtypeForaging   0.7121     3.0569   0.233
>>
>> Correlation of Fixed Effects:
>>              (Intr)
>> bhvrtypFrgn -0.449
>>
>>
>> I was quite happy with this and wanted to perform a LR test with the
>> model without the fixed effect (behaviourtype) - I gathered that one
>> should rather use the likelihood estimated via ML (and that is what is
>> done by the anova() function anyway), so for my 'personal interest' I
>> fitted the model again with REML=F:
>>
>> Linear mixed model fit by maximum likelihood
>> Formula: duration ~ behaviourtype + (1 | year) + (1 | indid)
>>     Data: mydata
>>    AIC  BIC logLik deviance REMLdev
>>   1716 1733 -853.2     1706    1699
>> Random effects:
>>   Groups   Name        Variance   Std.Dev.
>>   indid    (Intercept) 7.4845e-12 2.7358e-06
>>   year     (Intercept) 5.5096e+00 2.3473e+00
>>   Residual             4.2026e+02 2.0500e+01
>> Number of obs: 192, groups: indid, 10; year, 2
>>
>> Fixed effects:
>>                        Estimate Std. Error t value
>> (Intercept)             32.228      2.814  11.453
>> behaviourtypeForaging    1.590      3.005   0.529
>>
>> Correlation of Fixed Effects:
>>              (Intr)
>> bhvrtypFrgn -0.604
>>
>>
>> The variance estimates of the random effects are VERY different from the
>> REML estimates.
>> Admittedly my understanding of the fitting process of lmer is rather
>> small, and doing some google work or reading Zuur et al. or Gelman &
>> Hill books did not improve my understanding of what is happening here (I
>> may have to learn to read better). In particular, my questions for you guys:
>> 1) is my model implementation right ?
>> 2) are differences in ML and REML estimates affecting the LR test of the
>> model against a 'null' model (no fixed effect) ?
>>
>> Many thanks for sharing some of your most valuable resources (brain, time).
>> simon
>>
>> PS/ This is the data distribution across behaviour, years, and the 10
>> individuals:
>>
>> , ,  = year1
>>
>>
>>                1 2 3 4 5 6 7 8 9 10
>>    behaviour1  2 1 3 2 4 0 3 1 0 7
>>    behaviour2  4  5  7  4  5 10  0  1  7  5
>>
>> , ,  = year2
>>
>>
>>                1 2 3 4 5 6 7 8 9 10
>>    behaviour1 13  1  5  6  1 13  3  9  2 10
>>    behaviour2  4  4  8  7  6  3  9  6  8  3
>>
>>
>>
>> I'm happy to make mine the fortune:
>>
>> Getting flamed for asking dumb questions on a public mailing list is all part of growing up and being a man/woman.
>>     -- Michael Watson (in a discussion on whether answers on R-help should be more polite)
>>        R-help (December 2004)
>>
>>
>>
>>
>>
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>