[R-sig-ME] why would using p-values of GLMM for distr other than Gaussian be correct?

Tue Sep 24 16:39:57 CEST 2013

Pablo Inchausti <pablo.inchausti.f at ...> writes:

> 
> Hi Joshua,
> Thanks for your response.
> I tend to agree (intuitively) with you that when one has 50,000
> observations and 1,000 groups for the variable modelled as random effect,
> assuming a normal distribution for Wald test =coef/Se(coef) of the fixed
> effects without bothering about degrees of freedom is reasonable. However,
> the overwhelming majority of analyses deals with tens or at most a hundred
> observations and with random effects defined by a factor with a small (but
> generally greater than 5) number of categories. It is in this (often
> encountered) context that the discussion of how to count the degrees of
> freedom for the random effects seems to be critical. This tally of the
> degrees of freedom lies between two extremes: as one (because only the
> variance of the normally distributed random effect is estimated) or as the
> number of categories minus one of the variable modelled as random effect.
> In many (most?) cases, the assumption regarding the counting of the degrees
> of freedom does make a difference for evaluating the significance of the
> fixed effects.
> The significance tests of the fixed effects requires having the degrees of
> freedom of the model, which is why the library lme4 does not provide the
> p-values when family=Gaussian but it does provide them whenever family !=
> Gaussian, which was the question I posed in my mail. Other programs (SAS,
> Statistica) take a position/assumption about the degrees of freedom of the
> random effects that is at the very least debatable. DBates and others
> recommend using Bayesian methods to estimate the p-vales and the Conf
> Intervals, but the commonly available R functions only work for GLMM with
> family =Gaussian and with independent random slopes and intercepts.
> I hope that this mail helps clarify the questions I posed.
> Cheers
> Pablo
> 
> On 23 September 2013 18:50, Joshua Wiley <jwiley.psych at ...> wrote:
> 
> > Hi Pablo,
> >
> > I think it depends on the assumptions.  In theory with the right
> > degrees of freedom, you could fit linear mixed effects models on a
> > smaller sample reasonably.
> >
> > There are no degrees of freedom typically for glms, and GLMMs follow
> > suit.  Things like logistic regression rely on large sample
> > theory---you have a big enough sample degrees of freedom are
> > effectively infinite---the parameters are normally distributed and a z
> > test is fine.  The same would hold for linear mixed models.  If you
> > had say, 50000 observations from 1000 groups, p values assuming z =
> > b/se ~ Gaussian is pretty sensible.
> >
> > Cheers,
> >
> > Joshua
> >

 [snip snip snip]

   Just to amplify Joshua's answer: I really think that the reason
that p values are shown for GLMMs and not LMMs is cultural. The
classic mixed model ANOVA world is (perhaps appropriately) somewhat
obsessed with degrees of freedom, which translates to wanting to 
know what the real units of replication are so that proper inference
can be done; the LMM concern inherits from this.  On the GLM(M) side,
the *culture* is to rely on asymptotic theory.  There is theory
about finite-size corrections for GLMs (without random effects),
under the rubric of "Bartlett corrections", but it's not very
widely known or used.  Thus, summary.lm (for example) reports
t statistics (finite-size-corrected) while summary.glm reports Z 
statistics (asymptotic) ...

There's more discussion of this at http://glmm.wikidot.com/faq#df :
I might add a sentence or two explaining the cultural context.

  Ben