# [R] Conservative "ANOVA tables" in lmer

Douglas Bates bates at stat.wisc.edu
Thu Sep 7 17:32:06 CEST 2006

```On 07 Sep 2006 17:20:29 +0200, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> Martin Maechler <maechler at stat.math.ethz.ch> writes:
>
> > >>>>> "DB" == Douglas Bates <bates at stat.wisc.edu>
> > >>>>>     on Thu, 7 Sep 2006 07:59:58 -0500 writes:
> >
> >     DB> Thanks for your summary, Hank.
> >     DB> On 9/7/06, Martin Henry H. Stevens <hstevens at muohio.edu> wrote:
> >     >> Dear lmer-ers,
> >     >> My thanks for all of you who are sharing your trials and tribulations
> >     >> publicly.
> >
> >     >> I was hoping to elicit some feedback on my thoughts on denominator
> >     >> degrees of freedom for F ratios in mixed models. These thoughts and
> >     >> practices result from my reading of previous postings by Doug Bates
> >     >> and others.
> >
> >     >> - I start by assuming that the appropriate denominator degrees lies
> >     >> between n - p and and n - q, where n=number of observations, p=number
> >     >> of fixed effects (rank of model matrix X), and q=rank of Z:X.
> >
> >     DB> I agree with this but the opinion is by no means universal.  Initially
> >     DB> I misread the statement because I usually write the number of columns
> >     DB> of Z as q.
> >
> >     DB> It is not easy to assess rank of Z:X numerically.  In many cases one
> >     DB> can reason what it should be from the form of the model but a general
> >     DB> procedure to assess the rank of a matrix, especially a sparse matrix,
> >     DB> is difficult.
> >
> >     DB> An alternative which can be easily calculated is n - t where t is the
> >     DB> trace of the 'hat matrix'.  The function 'hatTrace' applied to a
> >     DB> fitted lmer model evaluates this trace (conditional on the estimates
> >     DB> of the relative variances of the random effects).
> >
> >     >> - I then conclude that good estimates of P values on the F ratios lie
> >     >>   between 1 - pf(F.ratio, numDF, n-p) and 1 - pf(F.ratio, numDF, n-q).
> >     >>   -- I further surmise that the latter of these (1 - pf(F.ratio, numDF,
> >     >>   n-q)) is the more conservative estimate.
> >
> > This assumes that the true distribution (under H0) of that "F ratio"
> > *is*  F_{n1,n2}  for some (possibly non-integer)  n1 and n2.
> > But AFAIU, this is only approximately true at best, and AFAIU,
> > the quality of this approximation has only been investigated
> > empirically for some situations.
> > Hence, even your conservative estimate of the P value could be
> > wrong (I mean "wrong on the wrong side" instead of just
> > "conservatively wrong").  Consequently, such a P-value is only
> > ``approximately conservative'' ...
> > I agree howevert that in some situations, it might be a very
> > useful "descriptive statistic" about the fitted model.
>
> I'm very wary of ANY attempt at guesswork in these matters.
>
> I may be understanding the post wrongly, but consider this case: Y_ij
> = mu + z_i + eps_ij, i = 1..3, j=1..100
>
> I get rank(X)=1, rank(X:Z)=3,  n=300
>
> It is well known that the test for mu=0 in this case is obtained by
> reducing data to group means, xbar_i, and then do a one-sample t test,
> the square of which is F(1, 2), but it seems to be suggested that
> F(1, 297) is a conservative test???!

It's a different test, isn't it?  Your test is based upon the between
group sum of squares with 2 df.  I am proposing to use the within
group sum of squares or its generalization.

```