[R-sig-ME] F and Wald chi-square tests in mixed-effects models

Tue Oct 4 18:31:35 CEST 2011

Helios de Rosario <helios.derosario at ...> writes:

> 
> [I sent this question to the R-help list, but following Ben Bolker's
> advice, I re-send it here. Please excuse me for the duplication.]

  Sorry not to respond to this sooner. I keep meaning to.
> 
> I know that, except in specific cases like well-balanced designs, the F
> statistic that is usually calculated in ANOVA tables may be far from
> being distributed as a known, exact F distribution, and that's the
> reason why the anova method on "mer" objects (calculated by lmer) does
> not calculate the denominator df nor a p-value. --- See for instance
> Douglas Bates' long post on this topic, in:
> https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html 
> 
> However, Anova (car package) does calculate p-values from Wald
> chi-square tests for fixed terms in "mer" objects (as well as in "lme"
> objects). I suppose that the key to understand the logic for this is in
> Fox & Weisberg's commentary in "An R Companion to Applied Regression"
> (2nd edition, p. 272), where they say: "Likelihood ratio tests and F
> tests require fitting more than one model to the data, while Wald tests
> do not."
> 
> Unfortunately, that's too brief a commentary for me to understand why
> and how the Wald test can overcome the deficiencies of F-tests in
> mixed-effects models. The online appendix of "An R Companion..." about
> mixed-effects models does not comment on hypothesis tests either.
> 
> I would appreciate if someone can give some clues or references to read
> about this issue.

 First let's look and see what Anova() is actually doing in this case.
You have to know where to look -- the Anova() method for 'mer' objects
would be called Anova.mer(), so we look at its guts (they're not
exported from the namespace so we have to prepend car:::):

car:::Anova.mer

This shows that it further calls car:::Anova.II.mer or
car:::Anova.III.mer ... which shows that it is just calling
linearHypothesis (hurrah, a documented function!) to get the
chi-square statistic, which is

SSH <- as.vector(t(L %*% b - rhs) %*% solve(L %*% V %*% t(L)) %*% 
        (L %*% b - rhs))

where V is the variance-covariance matrix, b are the parameter
estimates, L is the linear combination of the parameters to test,
and rhs is the hypothesized value (typically 0). This is just
the extension of the expression of the sum of squares of a particular
combination of parameters, scaled by the appropriate variances.

   So far this may be obvious or opaque depending on your level
of math/stats preparation.  This is the key (I would put in 
boldface and italic if I could): this test incorporates only
the uncertainty due to the fixed effect parameters, and assumes
that the variance-covariance matrix is estimated without error --
equivalent to assuming a large residual df, or to doing a Z-test
rather than a t-test of the parameters for a single parameter.

  It's really up to you (and your reviewers etc.) whether this
approximate test of the null hypothesis about a parameter or
combination of parameters is better than nothing, or whether you want
to go to the trouble of computing something better (via MCMC sampling,
parametric bootstrapping, or some approximation such as
Kenward-Roger).  Pinheiro and Bates (2000, pp. 87ff) compare and
contrast likelihood ratio tests (i.e., analogous to the Wald tests
under discussion here) and conditional F or t tests that (attempt to)
correct for the finite size of the sample.

  So I would say the bottom line is that the Wald test does not
"overcome the deficiencies of F-tests in mixed-effects models",
**it just ignores them** ... if you can come up with a sensible
approximation of the relevant 'denominator df', you can always
calculate the relevant test yourself ... (I say that recognizing
that you would have to some work --- in particular, I *think*
you'd need to test X^2/(residual df) against F(m,n) ... but you'd
have to sit down and work that out.)

  I would also point out that for whatever it's worth, the
doBy package now provides a (development/experimental) function
to compute Kenward-Roger corrected df, for those who are
philosophically so inclined.