[R] Conservative "ANOVA tables" in lmer

Sat Sep 9 00:00:00 CEST 2006

On 9/7/06, Douglas Bates <bates at stat.wisc.edu> wrote:
> On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> > Martin Maechler <maechler at stat.math.ethz.ch> writes:
> >
> > > >>>>> "DB" == Douglas Bates <bates at stat.wisc.edu>
> > > >>>>>     on Thu, 7 Sep 2006 07:59:58 -0500 writes:
> > >
> > >     DB> Thanks for your summary, Hank.
> > >     DB> On 9/7/06, Martin Henry H. Stevens <hstevens at muohio.edu> wrote:
> > >     >> Dear lmer-ers,
> > >     >> My thanks for all of you who are sharing your trials and tribulations
> > >     >> publicly.
> > >
> > >     >> I was hoping to elicit some feedback on my thoughts on denominator
> > >     >> degrees of freedom for F ratios in mixed models. These thoughts and
> > >     >> practices result from my reading of previous postings by Doug Bates
> > >     >> and others.
> > >
> > >     >> - I start by assuming that the appropriate denominator degrees lies
> > >     >> between n - p and and n - q, where n=number of observations, p=number
> > >     >> of fixed effects (rank of model matrix X), and q=rank of Z:X.
> > >
> > >     DB> I agree with this but the opinion is by no means universal.  Initially
> > >     DB> I misread the statement because I usually write the number of columns
> > >     DB> of Z as q.
> > >
> > >     DB> It is not easy to assess rank of Z:X numerically.  In many cases one
> > >     DB> can reason what it should be from the form of the model but a general
> > >     DB> procedure to assess the rank of a matrix, especially a sparse matrix,
> > >     DB> is difficult.
> > >
> > >     DB> An alternative which can be easily calculated is n - t where t is the
> > >     DB> trace of the 'hat matrix'.  The function 'hatTrace' applied to a
> > >     DB> fitted lmer model evaluates this trace (conditional on the estimates
> > >     DB> of the relative variances of the random effects).
> > >
> > >     >> - I then conclude that good estimates of P values on the F ratios lie
> > >     >>   between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
> > >     >>   -- I further surmise that the latter of these (1 - pf(F.ratio, numDF,
> > >     >>   n-q)) is the more conservative estimate.
> > >
> > > This assumes that the true distribution (under H0) of that "F ratio"
> > > *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
> > > But AFAIU, this is only approximately true at best, and AFAIU,
> > > the quality of this approximation has only been investigated
> > > empirically for some situations.
> > > Hence, even your conservative estimate of the P value could be
> > > wrong (I mean "wrong on the wrong side" instead of just
> > > "conservatively wrong").  Consequently, such a P-value is only
> > > ``approximately conservative'' ...
> > > I agree howevert that in some situations, it might be a very
> > > useful "descriptive statistic" about the fitted model.
> >
> > I'm very wary of ANY attempt at guesswork in these matters.
> >
> > I may be understanding the post wrongly, but consider this case: Y_ij
> > = mu + z_i + eps_ij, i = 1..3, j=1..100
> >
> > I get rank(X)=1, rank(X:Z)=3,  n=300
> >
> > It is well known that the test for mu=0 in this case is obtained by
> > reducing data to group means, xbar_i, and then do a one-sample t test,
> > the square of which is F(1, 2), but it seems to be suggested that
> > F(1, 297) is a conservative test???!
>
> It's a different test, isn't it?  Your test is based upon the between
> group sum of squares with 2 df.  I am proposing to use the within
> group sum of squares or its generalization.

On closer examination I see that you are indeed correct.  I have heard
that "well-known" result many times and finally sat down to prove it
to myself.  For a balanced design the standard error of the intercept
using the REML estimates is the same as the standard error of the mean
calculated from the group means.

> data(Rail, package = 'nlme')
> library(lme4)
> summary(fm1 <- lmer(travel ~ 1 + (1|Rail), Rail))
Linear mixed-effects model fit by REML
Formula: travel ~ 1 + (1 | Rail)
   Data: Rail
   AIC   BIC logLik MLdeviance REMLdeviance
 126.2 128.0 -61.09      128.6        122.2
Random effects:
 Groups   Name        Variance Std.Dev.
 Rail     (Intercept) 615.286  24.8050
 Residual              16.167   4.0208
number of obs: 18, groups: Rail, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)    66.50      10.17   6.538
> mns <- with(Rail, tapply(travel, Rail, mean)) # group means
> sd(mns)/sqrt(length(mns))  # standard error matches that from lmer
[1] 10.17104
> t.test(mns)

	One Sample t-test

data:  mns
t = 6.5382, df = 5, p-value = 0.001253
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 40.35452 92.64548
sample estimates:
mean of x
     66.5

> ctab <- summary(fm1)@coefs  # coefficient table
> ctab[,1] + c(-1,1) * qt(0.975, 15) * ctab[,2] # 95% conf. int.
[1] 44.82139 88.17861
> ## interval using df = # of obs - rank of [Z:X] is too narrow

So my proposal of using either the trace of the hat matrix or the rank
of the combined model matrices as the degrees of freedom for the model
is not conservative.

However, look at the following

> set.seed(123454321)  # for reproducibility
> sm1 <- mcmcsamp(fm1, 50000)
> library(coda)
> HPDinterval(sm1)
                    lower      upper
(Intercept)     40.470663  92.608514
log(sigma^2)     2.060179   3.716326
log(Rail.(In))   5.371858   8.056897
deviance       128.567329 137.487455
attr(,"Probability")
[1] 0.95

The HPD interval calculated from a MCMC sample reproduce the interval
from the group means almost exactly.  This makes sense in that the
MCMC sample takes into account the variation in the estimates of the
variance components, just as defining intervals based on the Student's
t does.

So for this case where the distribution of the estimate of the mean
has a known distribution the correct degrees of freedom and the MCMC
sample produce similar answers.

This gives me more confidence in the results from the MCMC sample in
general cases.

The problem I have with trying to work out what the degrees of freedom
"should be" is that the rules seem rather arbitrary.  For example, the
"between-within" rule used in SAS PROC Mixed is popular (many accept
it as the "correct" answer) but it assumes that the degrees of freedom
associated with a random effect grouped by a factor with k levels is
always k - 1.  This value is used even when there is a random
intercept and a random slope for each group.  In fact you could have
an arbitrary number of random effects for each level of the grouping
factor and it would still apparently only cost you k - 1 degrees of
freedom.  That doesn't make sense to me.

Anyway, I thank you for pointing out the errors of my ways Peter.