[R-sig-ME] glmer, p-values, and mcmcsamp

Fri Feb 11 02:07:54 CET 2011

I have interspersed comments below.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 11/02/2011, at 4:57 AM, Ben Bolker wrote:

> On 02/10/2011 05:17 AM, Sandra Hamel wrote:
>> Hello,
>> 
>> I'm aware of the problems related to p-values in linear mixed models.
>> I'm working on a glmer model and was a bit surprise to get p-values
>> because I was thinking there were also problems for glmer. From recent
>> posts in December on the the validity of p-values in glmer, the
>> conclusion was that p-values might actually suffer even worst problems
>> for glmer than for lmer. One first (maybe naive) question is why are
>> p-values provided in glmer if they suffer from similar issues than
>> p-values in lmer (which are not provided for that reason)?
> 
>  Not a ridiculous or naive question.  I won't speak for Doug Bates, but
> I think the reason is partly cultural/historical.

I think it pretty much entirely cultural/historical.  The only difference is
that in a very limited range of special cases, the denominator is known
(at all events this is what the analyst tells glmer!), i.e., the binomial 
or poisson or assumptions that lead to this variance can be trusted.
But even if the binomial or poisson assumptions can be trusted,
the relevant denominator variance is commonly the theoretical
variance, plus something that arises form the mm part of the model.

>  The sources of uncertainty or approximation in the test statistics
> differ between the LMM and GLMM case.

>  I am going into a lot of detail here in part because some of this is
> just now sinking in to my own brain, and it helps me to write it out.  I
> would welcome corrections, especially from John Maindonald (who also
> commented on these issues in the thread ending here
> <http://article.gmane.org/gmane.comp.lang.r.lme4.devel/4894>.

> 1. In the LMM case, for classical balanced designs we know that the
> test statistic (ratio of mean squares) is not just asymptotically but
> finitely (? better word?) correct, given that we have the right value
> for the denominator degrees of freedom.  For non-classical (unbalanced,
> crossed, correlated, weighted ...) cases we don't know that the test
> statistic is F distributed, but there have been attempts (Satterthwaite
> and Kenward-Roger [not Kenward-Rogers!] are the main ones) to correct
> the test distribution, or in the case of the K-R approach also the test
> statistic, based on information about the moments of the distribution of
> the test statistic under the null hypothesis. The reason for using an F
> statistic (rather than chi-squared, i.e. likelihood ratio test) in the
> first place is that we have a test statistic that is essentially of the
> form (deviance/scale parameter) [i.e. estimated model mean square/error
> mean square]; the denominator degrees of freedom describes the
> uncertainty in the estimate of the scale parameter.  For large
> denominator degrees of freedom this uncertainty gets small and the F
> test results would converge to that of the likelihood ratio test
> (similarly t-tests would converge to Z-tests).
>  (I don't know offhand how the F tests that lme/lmer implement
> incorporate the classical concern of "which error term to divide by" ...)

Even in balanced cases where the denominator is a combination of
two components of variance, there is an issue.  Consider fro example
the kiwishade data in the DAAG package that has a block/plot/subplot 
3/4/4 structure. The appropriate mean square for comparing treatments
is, according to the conventional wisdom, the plot mean square.  The
relevant mean square is then 20.93 on 6 degrees of freedom.

If one could argue that the plot component was zero, that the apparent
plot component is just a result of chance, the error mean square reduces
to 13.96 on 36 df.

The conventional wisdom makes sense in this case.  But suppose that
the estimate of the plot component of variance is so small that the two 
mean squares are quite similar.  The conventional wisdom has 6 df
for the variance estimate, even though it is a linear combination of
two variances, with one estimated on 36df that accounts for most of the
mean square.  So the use of 6 df is conservative.  Problem is, some
kind of Bayesian prior is required in order to achieve a compromise
between the 6df and the 36df.

Some will want to test whether the plot component is "significantly" > 0.
If there is any reason at all to suspect that the plot component, although
small, ought to be greater than 0, this is anti-conservative.  In a field trial
plot/subplot context, there is every reason to expect a plot component 
that is >0, unless plots have been chosen inappropriately.

> 2. in GLMs (not GLMMs) where we don't estimate the scale parameter
> (e.g. binomial and Poisson) we are not testing deviance
> difference/estimated scale parameter, but just deviance difference
> itself (usually the assumed scale parameter is 1.0).  In this case, the
> badness of the likelihood ratio test approximation comes because the
> deviance difference itself is only approximately chi-squared
> distributed. There are some papers by Cordeiro from the 1980s on
> corrections to the chi-squared assumption that at the level of a crude
> analogy are the same as the Satterthwaite/K-R approaches; find a
> correction term to adjust the reference distribution based on something
> we know about its moments. (I don't know if these approaches have
> actually been implemented in any stats package ...) There is also some
> stuff in McCullagh and Nelder (which I need to bite the bullet and buy)
> on this topic.
> 
> 3. GLMMs without estimated scale parameters have the problems listed in
> #2, but compounded by the problem that (depending how they are
> calculated) the terms in the deviance may not be independent -- we don't
> know what the 'effective degrees of freedom' are.  Someone with
> sufficiently good math stats chops might be able to work out an
> extension of the Cordeiro/Satterthwaite/K-R approaches that would apply
> in the GLMM case ...
> 
> 4. If we were to use quasi-likelihood approaches we would then be back
> in a situation where we had both (all three? deviance !~ chi-square,
> error in the estimated scale parameter, unknown "effective denominator
> df") sources of error.
> 
>  Historically/culturally, it seems that in the GLM context people
> mostly ignore/don't worry about the fact that LRT is only asymptotically
> correct. (Faraway 2006: "For every GLM except the Gaussian, an
> approximate null distribution must be used whose accuracy may be in
> doubt particularly for smaller samples.")  This is probably what leads
> to Z-test statistics being quoted in the GLMM case.  Perhaps they
> shouldn't be.

The Z-statistics are of course not likelihood ratio statistics.  They are
commonly known as Wald statistics.  They can suffer from severe problems
that arise when the SEs (based on single term Taylor series approximations)
change dramatically in the neighbourhood of the relevant point on the 
likelihood surface.  This leads for binomial data to the Hauck-Donner effect 
(Modern Applied Statistics with S, 4th edn, p.197), whereby the Z-statistics 
reduce as one of the treatments that is compared moves further into the tail 
of the binomial (proportions closer to 0 or 1).  There is a corresponding issue
for Poisson data, as the estimate of one of the means that is compared
moves close to zero; see Maindonald & Braun, Data Analysis & Graphics
Using R, 2edn p267, 3edn p263.

>> In linear models, I use mcmcsamp to get CI or p-values, but I understood
>> that this function is not yet implemented for glmer models. I was
>> wondering then what is the best option to get either CI or p-values that
>> would be "valid" for glmer models? I have seen an old post (2008) from
>> Ben Bolker where he mentioned a "glmersim.R" function he had written
>> (and a following post from Douglas Bates where he mentioned that this
>> function could be included in lme4). Would "glmersim.R" allow me to get
>> something similar to mcmcsamp? Are there still plans to include the
>> possibility to allow glmer to run in mcmcsamp?
> 
>  I don't know about plans for mcmcsamp with glmer.
>  I am working to incorporate glmersim into lme4 in the reasonably near
> future (subject to review/approval by the other lme4 authors ...).  This
> will give a reasonably simple (albeit computationally slow) way to get
> p-values for terms by doing 'parametric bootstrap' for specific model
> comparisons (see <http://glmm.wikidot.com> for examples: I would also
> hope to include some simple examples in the documentation).  In
> principle I think it could also be used to get confidence intervals but
> would need to think about it some more.

It would be good to be able to experiment with alternative priors, as a
check on the robustness of results.

>> 
>> Thanks for your help,
>> Sandra
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models