[R-sig-ME] glmer, p-values, and mcmcsamp

Thu Feb 10 18:57:55 CET 2011

On 02/10/2011 05:17 AM, Sandra Hamel wrote:
> Hello,
> 
> I'm aware of the problems related to p-values in linear mixed models.
> I'm working on a glmer model and was a bit surprise to get p-values
> because I was thinking there were also problems for glmer. From recent
> posts in December on the the validity of p-values in glmer, the
> conclusion was that p-values might actually suffer even worst problems
> for glmer than for lmer. One first (maybe naive) question is why are
> p-values provided in glmer if they suffer from similar issues than
> p-values in lmer (which are not provided for that reason)?

  Not a ridiculous or naive question.  I won't speak for Doug Bates, but
I think the reason is partly cultural/historical.
  The sources of uncertainty or approximation in the test statistics
differ between the LMM and GLMM case.

  I am going into a lot of detail here in part because some of this is
just now sinking in to my own brain, and it helps me to write it out.  I
would welcome corrections, especially from John Maindonald (who also
commented on these issues in the thread ending here
<http://article.gmane.org/gmane.comp.lang.r.lme4.devel/4894>.

  1. In the LMM case, for classical balanced designs we know that the
test statistic (ratio of mean squares) is not just asymptotically but
finitely (? better word?) correct, given that we have the right value
for the denominator degrees of freedom.  For non-classical (unbalanced,
crossed, correlated, weighted ...) cases we don't know that the test
statistic is F distributed, but there have been attempts (Satterthwaite
and Kenward-Roger [not Kenward-Rogers!] are the main ones) to correct
the test distribution, or in the case of the K-R approach also the test
statistic, based on information about the moments of the distribution of
the test statistic under the null hypothesis. The reason for using an F
statistic (rather than chi-squared, i.e. likelihood ratio test) in the
first place is that we have a test statistic that is essentially of the
form (deviance/scale parameter) [i.e. estimated model mean square/error
mean square]; the denominator degrees of freedom describes the
uncertainty in the estimate of the scale parameter.  For large
denominator degrees of freedom this uncertainty gets small and the F
test results would converge to that of the likelihood ratio test
(similarly t-tests would converge to Z-tests).
  (I don't know offhand how the F tests that lme/lmer implement
incorporate the classical concern of "which error term to divide by" ...)

 2. in GLMs (not GLMMs) where we don't estimate the scale parameter
(e.g. binomial and Poisson) we are not testing deviance
difference/estimated scale parameter, but just deviance difference
itself (usually the assumed scale parameter is 1.0).  In this case, the
badness of the likelihood ratio test approximation comes because the
deviance difference itself is only approximately chi-squared
distributed. There are some papers by Cordeiro from the 1980s on
corrections to the chi-squared assumption that at the level of a crude
analogy are the same as the Satterthwaite/K-R approaches; find a
correction term to adjust the reference distribution based on something
we know about its moments. (I don't know if these approaches have
actually been implemented in any stats package ...) There is also some
stuff in McCullagh and Nelder (which I need to bite the bullet and buy)
on this topic.

3. GLMMs without estimated scale parameters have the problems listed in
#2, but compounded by the problem that (depending how they are
calculated) the terms in the deviance may not be independent -- we don't
know what the 'effective degrees of freedom' are.  Someone with
sufficiently good math stats chops might be able to work out an
extension of the Cordeiro/Satterthwaite/K-R approaches that would apply
in the GLMM case ...

4. If we were to use quasi-likelihood approaches we would then be back
in a situation where we had both (all three? deviance !~ chi-square,
error in the estimated scale parameter, unknown "effective denominator
df") sources of error.

  Historically/culturally, it seems that in the GLM context people
mostly ignore/don't worry about the fact that LRT is only asymptotically
correct. (Faraway 2006: "For every GLM except the Gaussian, an
approximate null distribution must be used whose accuracy may be in
doubt particularly for smaller samples.")  This is probably what leads
to Z-test statistics being quoted in the GLMM case.  Perhaps they
shouldn't be.

> In linear models, I use mcmcsamp to get CI or p-values, but I understood
> that this function is not yet implemented for glmer models. I was
> wondering then what is the best option to get either CI or p-values that
> would be "valid" for glmer models? I have seen an old post (2008) from
> Ben Bolker where he mentioned a "glmersim.R" function he had written
> (and a following post from Douglas Bates where he mentioned that this
> function could be included in lme4). Would "glmersim.R" allow me to get
> something similar to mcmcsamp? Are there still plans to include the
> possibility to allow glmer to run in mcmcsamp?

  I don't know about plans for mcmcsamp with glmer.
  I am working to incorporate glmersim into lme4 in the reasonably near
future (subject to review/approval by the other lme4 authors ...).  This
will give a reasonably simple (albeit computationally slow) way to get
p-values for terms by doing 'parametric bootstrap' for specific model
comparisons (see <http://glmm.wikidot.com> for examples: I would also
hope to include some simple examples in the documentation).  In
principle I think it could also be used to get confidence intervals but
would need to think about it some more.
> 
> Thanks for your help,
> Sandra
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models