[R] Conservative
Douglas Bates
bates at stat.wisc.edu
Thu Sep 14 16:32:46 CEST 2006
On 14 Sep 2006 12:21:28 +0200, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si> writes:
>
> > Douglas Bates <bates <at> stat.wisc.edu> writes:
> >
> > > On 9/13/06, Dimitris Rizopoulos <dimitris.rizopoulos <at> med.kuleuven.be>
> > > > > I believe that the LRT is anti-conservative for fixed effects, as
> > > > > described in Pinheiro and Bates companion book to NLME.
> > > > >
> > > > You have this effect if you're using REML, for ML I don't think there
> > > > is any problem to use LRT between nested models with different
> > > > fixed-effects structure.
> > ...
> > > The other question is how does one evaluate the likelihood-ratio test
> > > statistic and that is the issue that Dimitris is addressing. The REML
> > > criterion is a modified likelihood and it is inappropriate to look at
> > > differences in the REML criterion when the models being compared have
> > > different fixed-effects specifications, or even a different
> > > parameterization of the fixed effects. However, the anova method for
> > > an lmer object does not use the REML criterion even when the model has
> > > been estimated by REML. It uses the profiled log-likelihood evaluated
> > > at the REML estimates of the relative variances of the random effects.
> > > That's a complicated statement so let me break it down.
> > ...
> >
> > Is this then the same answer as given by Robinson:1991 (ref at the end) to
> > question by Robin Thompson on which likelihood (ML or REML) should be used
> > in testing the "fixed" effects. Robinson answered (page 49 near bottom
> > right) that both likelihoods give the same conclusion about fixed effects.
> > Can anyone comment on this issues?
>
> At the risk of sticking my foot in it due to not reading the paper
> carefully enough: There appears to be two other likelihoods in play,
> one traditional one depending on fixed effects and variances and
> another depending on fixed effects and BLUPs ("most likely
> unobservables"). I think Robinson is talking about the equivalence of
> those two.
>
> (and BTW ss=Statistical Science in the ref.)
> >
> > @Article{Robinson:1991,
> > author = {Robinson, G. K.},
> > title = {That {BLUP} is a good thing: the estimation of random
> > effects},
> > journal = ss,
> > year = {1991},
> > volume = {6},
> > number = {1},
> > pages = {15--51},
> > keywords = {BLUP, example, derivations, links, applications},
> > vnos = {GG}
> > }
The issue that I was attempting to describe is somewhat different.
I'm only considering using the log-likelihood values for the
likelihood ratio test (which, BTW, I would not recommend for testing
terms in the fixed-effects specification but that's another
discussion). To perform such a test one should evaluate the
log-likelihood for the models being compared, which is to say you need
the minimum of the log-likelihood for each model over that model's
parameter space. If you estimated parameters in a model by REML (the
default criterion) then all you know is that the log-likelihood for
the model is bounded above by the log-likelihood for the model
evaluated at those parameter estimates. To be able to perform the
test properly you should evaluate the ML estimates and take the
log-likelihood at those estimates. For small data sets and simple
models this wouldn't be a problem. For large data sets with complex
model structures this could take some time.
However, the profiled log-likelihood evaluated at the REML estimates
of the relative variances of the random effects (and the corresponding
conditional ML estimates of $\sigma^2$ and the fixed-effects) is very
close to the log-likelihood for the model so I take a short cut and
use that as the log-likelihood for the model.
So the quantities labelled "MLdeviance" and "REMLdeviance" in the
show() or summary() output of an lmer model can be interpreted as
-2*log-likelihood and -2*log-restricted-likelihood for the model not
for the particular parameters being reported. If you estimated the
parameters by REML then the REMLdeviance is accurate and the
MLdeviance will be a slight over-estimate. If you estimated the
parameters by ML you get the opposite situation (accurate MLdeviance,
slight over-estimate of REMLdeviance). The rest of the output shows
the parameter estimates for the criterion that you used to estimate
the model. The parameter estimates for the other criterion could be
quite different.
Here's a rather extreme example using the PBIB data from the SASmixed package.
> (fm1 <- lmer(response ~ Treatment + (1|Block), PBIB, method = "REML"))
Linear mixed-effects model fit by REML
Formula: response ~ Treatment + (1 | Block)
Data: PBIB
AIC BIC logLik MLdeviance REMLdeviance
83.9849 117.4944 -25.99245 22.82831 51.98489
Random effects:
Groups Name Variance Std.Dev.
Block (Intercept) 0.046522 0.21569
Residual 0.085559 0.29250
number of obs: 60, groups: Block, 15
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.817523 0.166413 16.9309
Treatment10 -0.326461 0.222061 -1.4701
Treatment11 0.081177 0.227200 0.3573
Treatment12 0.235299 0.222061 1.0596
Treatment13 -0.199753 0.222061 -0.8995
Treatment14 -0.326211 0.222061 -1.4690
Treatment15 0.041711 0.222061 0.1878
Treatment2 -0.412208 0.222061 -1.8563
Treatment3 -0.362579 0.222061 -1.6328
Treatment4 -0.033692 0.222061 -0.1517
Treatment5 -0.012625 0.222061 -0.0569
Treatment6 0.093171 0.227200 0.4101
Treatment7 -0.028537 0.222061 -0.1285
Treatment8 -0.035917 0.222061 -0.1617
Treatment9 0.073789 0.222061 0.3323
> (fm1ML <- lmer(response ~ Treatment + (1|Block), PBIB, method = "ML"))
Linear mixed-effects model fit by maximum likelihood
Formula: response ~ Treatment + (1 | Block)
Data: PBIB
AIC BIC logLik MLdeviance REMLdeviance
54.57058 88.08009 -11.28529 22.57058 52.20403
Random effects:
Groups Name Variance Std.Dev.
Block (Intercept) 0.045030 0.21220
Residual 0.060371 0.24571
number of obs: 60, groups: Block, 15
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.822676 0.143563 19.6615
Treatment10 -0.327070 0.187925 -1.7404
Treatment11 0.067533 0.192717 0.3504
Treatment12 0.222437 0.187925 1.1836
Treatment13 -0.195125 0.187925 -1.0383
Treatment14 -0.323562 0.187925 -1.7218
Treatment15 0.034576 0.187925 0.1840
Treatment2 -0.416171 0.187925 -2.2146
Treatment3 -0.368036 0.187925 -1.9584
Treatment4 -0.057737 0.187925 -0.3072
Treatment5 -0.017386 0.187925 -0.0925
Treatment6 0.086646 0.192717 0.4496
Treatment7 -0.037247 0.187925 -0.1982
Treatment8 -0.035441 0.187925 -0.1886
Treatment9 0.076438 0.187925 0.4067
You can see that the MLdeviance at the ML estimates is lower than at
the REML estimates but not a lot lower. Other quantities like the
estimate of $\sigma^2$ and the standard errors of the fixed effects do
change considerably between the two sets of estimates.
This example also shows why using likelihood ratio tests for terms in
the fixed-effects specification is not a good idea. A likelihood
ratio test for the Treatment effect would be either
> (fm2ML <- lmer(response ~ 1 + (1|Block), PBIB, method = "ML"))
Linear mixed-effects model fit by maximum likelihood
Formula: response ~ 1 + (1 | Block)
Data: PBIB
AIC BIC logLik MLdeviance REMLdeviance
50.15189 54.34058 -23.07595 46.15189 49.53125
Random effects:
Groups Name Variance Std.Dev.
Block (Intercept) 0.057544 0.23988
Residual 0.092444 0.30405
number of obs: 60, groups: Block, 15
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.736667 0.073328 37.321
> anova(fm2ML, fm1ML)
Data: PBIB
Models:
fm2ML: response ~ 1 + (1 | Block)
fm1ML: response ~ Treatment + (1 | Block)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
fm2ML 2 50.152 54.341 -23.076
fm1ML 16 54.571 88.080 -11.285 23.581 14 0.05144 .
or
> (fm2 <- lmer(response ~ 1 + (1|Block), PBIB))
Linear mixed-effects model fit by REML
Formula: response ~ 1 + (1 | Block)
Data: PBIB
AIC BIC logLik MLdeviance REMLdeviance
53.50553 57.69422 -24.75277 46.17836 49.50553
Random effects:
Groups Name Variance Std.Dev.
Block (Intercept) 0.063306 0.25161
Residual 0.092444 0.30405
number of obs: 60, groups: Block, 15
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.736667 0.075902 36.055
> anova(fm2, fm1)
Data: PBIB
Models:
fm2: response ~ 1 + (1 | Block)
fm1: response ~ Treatment + (1 | Block)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
fm2 2 50.178 54.367 -23.089
fm1 16 54.828 88.338 -11.414 23.35 14 0.05481 .
(The first is more accurate than the second.) However, both p-values
are anti-conservative relative to an empirical p-value obtained from
simulation of data sets under the null hypothesis.
More information about the R-help
mailing list