[R-sig-ME] single argument anova for GLMMs (really, glmer, or dispersion?)

Fri Dec 12 16:29:20 CET 2008

I appreciate your thoughts on this, John, and especially the example.
There is a tendency in discussions of the theory to ignore the
practical applications and in discussion of the applications to ignore
the theoretical foundations.  Effective statistical analysis should,
of course, involve both.

I include some comments below.

On Thu, Dec 11, 2008 at 11:29 PM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
> The approaches have the fundamental difference that the overdispersion model
> multiplies the theoretical variance by an amount that is constant (whether
> on the scale of the response [the binomial variance becomes \phi n p(1-p)],
> or on the scale of the linear predictor).

This highlights the problem that I have with the quasi families.  One
cannot choose the mean and variance separately in distributions like
the Bernoulli or Poisson.  (As the binomial is a sum of independent
Bernoulli distributions I will phrase this in terms of the Bernoulli
so I don't trip up on the n's.)  For a Gaussian distribution one has
the luxury of separately setting the mean and the variance.  For the
Bernoulli or Poisson you don't.  The choice of the mean completely
determines the distribution.  There is no such thing as a Bernoulli
distribution with variance \phi p(1-p) unless \phi is identically 1.

If you want to discuss generalizations of the Bernoulli or Poisson
distributions I could accept that but simply saying that we will base
a model on a distribution that is just like the Bernoulli except that
it has a tunable variance won't fly.  If we want to attribute
desirable statistical properties to estimates by saying, for example,
that they are maximum likelihood estimates or approximations to the
maximum likelihood estimates we must be able to define the likelihood
and to me that means that you must have a probability model.   That
is, you must be able to describe the probability distribution of the
response or, in the case of the generalized linear mixed models, the
conditional distribution of the response given the random effects.

If I can't define the probability distribution then how can I evaluate
the likelihood?  One can argue by analogy that the estimates based on
the Bernoulli have certain properties so we think that the estimates
from a hypothetical distribution that is just like the Bernoulli
except that it is overdispersed should have similar properties with an
extra parameter thrown in but that does not constitute a "sound
theoretical basis" and there is no justification for thinking that
such a procedure will allow for extensions to random effects.

> I have called overdispersion a model - actually it is not one model, but a
> range of possible models. I have no problem, in principle, with one fitting
> method that reflects multiple possible models once one gets down to detail.

> GLMMs add to the theoretical variance, on the scale of the linear predictor.
> For binomial models with the usual link functions (logit, probit, cloglog),
> the scale spreads out close to p=0 or close to p=1,   With the glmm models
> the variances then increase more, relatively to the overdispersion model, at
> the  extremes of the scale.   (For the Poisson with a log link, there is
> just one relevant extreme, at 0.)

Although you are not saying it explicitly it seems that you are
regarding the overdispersion and random effects approaches as a tool
for modeling the marginal variance of the responses.  It happens that
I don't think of the models that way, perhaps because I have
difficulty thinking of the marginal variance.  I'm enough of a
mathematician that I like to consider the simple cases which to me
means the unconditional distribution of the random effects and the
conditional distribution of the response.  In the way that I think of
the models those are both relatively simple distributions.

One of the things that trip people up when considering random effects
as ways of modifying the marginal variance structure of the response
is that techniques or modifications that will work for the Gaussian
distribution don't work for other distributions, because of the fact
that the mean and variance are separately specified in the Gaussian
but not in others.  In the nlme package Jose and I allowed for
correlation structures and variance functions for the response to be
specified separately from the random effects structure.  This is
legitimate in that we were only allowing models for which the
conditional distribution of Y given B = b is Gaussian.  Extending this
type of modeling to other families of conditional distribution is
conceptually very difficult.  Many people want to do this but, as that
noted philosopher Mick Jagger pointed out, "You can't always get what
you want.".  If you want to model binary or count data you have to
accept that the mean and the variance of the conditional distribution
cannot be modeled separately.  (Another noted philosopher, Ernie of
Sesame Street, pointed out that "You gotta put down the ducky if you
want to play the saxophone.")

> NB also, all variance assessments are conditional on getting the link right.
>  If the link is wrong in a way that matters, there will be apparent
> increases in variance in some parts of the scale that reflect biases that
> arise from the inappropriate choice of link.
>
> There may be cases where overdispersion gives too small a variance
> (relatively) at the extremes, while glmer gives too high a variance.  As
> there are an infinite number of possible ways in which the variance might
> vary with (in the binomial case) p, it would be surprising if (detectable
> with enough data, or enough historical experience), there were not such
> "intermediate" cases.
>
> There might in principle be subplot designs, with a treatment at the subplot
> level, where the overdispersion model is required at the subplot level in
> order to get the treatment comparisons correct at that level.
>
> As much of this discussion is focused around ecology, experience with
> fitting one or other model to large datasets is surely required that will
> help decide just how, in one or other practical context, 1) the variance is
> likely to change with p (or in the Poisson case, with the Poisson mean) and
> 2) what links seem preferable.
>
> The best way to give the flexibility required for modeling the variance, as
> it seems to me, would be the ability to make the variance of p a fairly
> arbitrary function of p, with other variance components added on the scale
> of the linear predictor.  More radically, all variance components might be
> functions of p.  I am not sure that going that far would be a good idea -
> there'd be too many complaints that model fits will not converge!
>
> The following shows a comparison that I did recently for a talk.  The p's
> are not sufficiently extreme to show much difference between the two models:
>
> The dataset cbpp is from the lme4 package.
> infect <- with(cbpp, cbind(incidence, size - incidence))
> (gm1 <- glmer(infect ~ period + (1 | herd),
> family = binomial, data = cbpp))
> Random effects:
> Groups Name Variance Std.Dev.
> herd (Intercept) 0.412 0.642
> Number of obs: 56, groups: herd, 15
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.399 0.228 -6.14 8.4e-10
> period2 -0.992 0.305 -3.25 0.00116
> period3 -1.129 0.326 -3.46 0.00054
> period4 -1.580 0.429 -3.69 0.00023
>
> Here, use the "sum" contrasts, and compare with the overall mean.
>                            glmer                quasibinomial
>                    Est  SE          z     Est SE (binomial SE)    t
> (Intercept) -2.32 0.22 -10.5   -2.33 0.21 (.14)             -11.3
> Period1     -0.66 0.32   -2.1   -0.72 0.45 (.31)               -1.6
> Period2      0.93 0.18    5.0     1.06 0.26 (.17)                4.2
> Period3     -0.07 0.23  -0.3    -0.11 0.34 (.23)               -0.3
> Period4     -0.20 0.25  -0.8    -0.24 0.36 (.24)               -0.7
>
> The SEs (really SEDs) are not much increased from the quasibinomial model.
> The estimates of treatment eﬀects (diﬀerences from the overall mean) are
> substantially reduced (pulled in towards the overall mean). The net eﬀect is
> that the z -statistic is smaller for the glmer model than the t for the
> quasibinomial model.
>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
>
>
> On 12/12/2008, at 7:52 AM, Andrew Robinson wrote:
>
>> Echoing Murray's points here - nicely put, Murray - it seems to me
>> that the quasi-likelihood and the GLMM are different approaches to the
>> same problem.
>>
>> Can anyone provide a substantial example where random effects and
>> quasilikelihood have both been necessary?
>>
>> Best wishes,
>>
>> Andrew
>>
>>
>> On Fri, Dec 12, 2008 at 09:11:39AM +1300, Murray Jorgensen wrote:
>>>
>>> The following is how I think about this at the moment:
>>>
>>> The quasi-likelihood approach is an attempt at a model-free approach to
>>> the problem of overdispersion in non-Gaussian regression situations
>>> where standard distributional assumptions fail to provide the observed
>>> mean-variance relationship.
>>>
>>> The glmm approach, on the other hand, does not abandon models and
>>> likelihood but seeks to account for the observed mean-variance
>>> relationship by adding unobserved latent variables (random effects) to
>>> the model.
>>>
>>> Seeking to combine the two approaches by using both quasilikelihood
>>> *and* random effects would seem to be asking for trouble as being able
>>> to use two tools on one problem would give a lot of flexibility to the
>>> parameter estimation; probably leading to a very flat quasilikelihood
>>> surface and ill-determined optima.
>>>
>>> But all of the above is only thoughts without the benefit of either
>>> serious attempts at fitting real data or doing serious theory so I will
>>> defer to anyone who has done either!
>>>
>>> Philosophically, at least, there seems to be clash between the two
>>> approaches and I doubt that attempts to combine them will be successful.
>>>
>>> Murray Jorgensen
>>>
>>>
>>
>> --
>> Andrew Robinson
>> Department of Mathematics and Statistics            Tel: +61-3-8344-6410
>> University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
>> http://www.ms.unimelb.edu.au/~andrewpr
>> http://blogs.mbs.edu/fishing-in-the-bay/
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>