[R-sig-ME] single argument anova for GLMMs (really, glmer, or dispersion?)
maj at stats.waikato.ac.nz
Sat Dec 13 08:51:41 CET 2008
I should have replied earlier to the kind comments Andrew and Doug made
earlier on my contrast between quasilikelihood 'models' and GLMMs,
however I am being invaded by sons, daughters in and out of law and a
I tried to be neutral in this comparison but I will 'come out' here and
admit that my preference is for fully-explicit probability models for
the data. It seems to me that without this you don't have a clear
meaning for the parameters that you are estimating. John says "I have no
problem, in principle, with one fitting method that reflects multiple
possible models once one gets down to detail." This may be psychological
but I would have to say that I *do* have a problem with that.
Because data can be generated from fully-specified models it seems to me
that there is scope for a computer experiment comparing the results of
both fitting methods to data generated from a known model. Just because
the data would be generated from a known GLMM it need not mean that the
GLMM estimates from fitting the correct model would be superior. I
imagine that quasi approaches might do well in small samples.
Clearly the design of such a computer experiment would need to be
carefully thought out. It would probably better to design around a
particular real situation than to try for great generality.
Douglas Bates wrote:
> I appreciate your thoughts on this, John, and especially the example.
> There is a tendency in discussions of the theory to ignore the
> practical applications and in discussion of the applications to ignore
> the theoretical foundations. Effective statistical analysis should,
> of course, involve both.
> I include some comments below.
> On Thu, Dec 11, 2008 at 11:29 PM, John Maindonald
> <john.maindonald at anu.edu.au> wrote:
>> The approaches have the fundamental difference that the overdispersion model
>> multiplies the theoretical variance by an amount that is constant (whether
>> on the scale of the response [the binomial variance becomes \phi n p(1-p)],
>> or on the scale of the linear predictor).
> This highlights the problem that I have with the quasi families. One
> cannot choose the mean and variance separately in distributions like
> the Bernoulli or Poisson. (As the binomial is a sum of independent
> Bernoulli distributions I will phrase this in terms of the Bernoulli
> so I don't trip up on the n's.) For a Gaussian distribution one has
> the luxury of separately setting the mean and the variance. For the
> Bernoulli or Poisson you don't. The choice of the mean completely
> determines the distribution. There is no such thing as a Bernoulli
> distribution with variance \phi p(1-p) unless \phi is identically 1.
> If you want to discuss generalizations of the Bernoulli or Poisson
> distributions I could accept that but simply saying that we will base
> a model on a distribution that is just like the Bernoulli except that
> it has a tunable variance won't fly. If we want to attribute
> desirable statistical properties to estimates by saying, for example,
> that they are maximum likelihood estimates or approximations to the
> maximum likelihood estimates we must be able to define the likelihood
> and to me that means that you must have a probability model. That
> is, you must be able to describe the probability distribution of the
> response or, in the case of the generalized linear mixed models, the
> conditional distribution of the response given the random effects.
> If I can't define the probability distribution then how can I evaluate
> the likelihood? One can argue by analogy that the estimates based on
> the Bernoulli have certain properties so we think that the estimates
> from a hypothetical distribution that is just like the Bernoulli
> except that it is overdispersed should have similar properties with an
> extra parameter thrown in but that does not constitute a "sound
> theoretical basis" and there is no justification for thinking that
> such a procedure will allow for extensions to random effects.
>> I have called overdispersion a model - actually it is not one model, but a
>> range of possible models. I have no problem, in principle, with one fitting
>> method that reflects multiple possible models once one gets down to detail.
>> GLMMs add to the theoretical variance, on the scale of the linear predictor.
>> For binomial models with the usual link functions (logit, probit, cloglog),
>> the scale spreads out close to p=0 or close to p=1, With the glmm models
>> the variances then increase more, relatively to the overdispersion model, at
>> the extremes of the scale. (For the Poisson with a log link, there is
>> just one relevant extreme, at 0.)
> Although you are not saying it explicitly it seems that you are
> regarding the overdispersion and random effects approaches as a tool
> for modeling the marginal variance of the responses. It happens that
> I don't think of the models that way, perhaps because I have
> difficulty thinking of the marginal variance. I'm enough of a
> mathematician that I like to consider the simple cases which to me
> means the unconditional distribution of the random effects and the
> conditional distribution of the response. In the way that I think of
> the models those are both relatively simple distributions.
> One of the things that trip people up when considering random effects
> as ways of modifying the marginal variance structure of the response
> is that techniques or modifications that will work for the Gaussian
> distribution don't work for other distributions, because of the fact
> that the mean and variance are separately specified in the Gaussian
> but not in others. In the nlme package Jose and I allowed for
> correlation structures and variance functions for the response to be
> specified separately from the random effects structure. This is
> legitimate in that we were only allowing models for which the
> conditional distribution of Y given B = b is Gaussian. Extending this
> type of modeling to other families of conditional distribution is
> conceptually very difficult. Many people want to do this but, as that
> noted philosopher Mick Jagger pointed out, "You can't always get what
> you want.". If you want to model binary or count data you have to
> accept that the mean and the variance of the conditional distribution
> cannot be modeled separately. (Another noted philosopher, Ernie of
> Sesame Street, pointed out that "You gotta put down the ducky if you
> want to play the saxophone.")
>> NB also, all variance assessments are conditional on getting the link right.
>> If the link is wrong in a way that matters, there will be apparent
>> increases in variance in some parts of the scale that reflect biases that
>> arise from the inappropriate choice of link.
>> There may be cases where overdispersion gives too small a variance
>> (relatively) at the extremes, while glmer gives too high a variance. As
>> there are an infinite number of possible ways in which the variance might
>> vary with (in the binomial case) p, it would be surprising if (detectable
>> with enough data, or enough historical experience), there were not such
>> "intermediate" cases.
>> There might in principle be subplot designs, with a treatment at the subplot
>> level, where the overdispersion model is required at the subplot level in
>> order to get the treatment comparisons correct at that level.
>> As much of this discussion is focused around ecology, experience with
>> fitting one or other model to large datasets is surely required that will
>> help decide just how, in one or other practical context, 1) the variance is
>> likely to change with p (or in the Poisson case, with the Poisson mean) and
>> 2) what links seem preferable.
>> The best way to give the flexibility required for modeling the variance, as
>> it seems to me, would be the ability to make the variance of p a fairly
>> arbitrary function of p, with other variance components added on the scale
>> of the linear predictor. More radically, all variance components might be
>> functions of p. I am not sure that going that far would be a good idea -
>> there'd be too many complaints that model fits will not converge!
>> The following shows a comparison that I did recently for a talk. The p's
>> are not sufficiently extreme to show much difference between the two models:
>> The dataset cbpp is from the lme4 package.
>> infect <- with(cbpp, cbind(incidence, size - incidence))
>> (gm1 <- glmer(infect ~ period + (1 | herd),
>> family = binomial, data = cbpp))
>> Random effects:
>> Groups Name Variance Std.Dev.
>> herd (Intercept) 0.412 0.642
>> Number of obs: 56, groups: herd, 15
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -1.399 0.228 -6.14 8.4e-10
>> period2 -0.992 0.305 -3.25 0.00116
>> period3 -1.129 0.326 -3.46 0.00054
>> period4 -1.580 0.429 -3.69 0.00023
>> Here, use the "sum" contrasts, and compare with the overall mean.
>> glmer quasibinomial
>> Est SE z Est SE (binomial SE) t
>> (Intercept) -2.32 0.22 -10.5 -2.33 0.21 (.14) -11.3
>> Period1 -0.66 0.32 -2.1 -0.72 0.45 (.31) -1.6
>> Period2 0.93 0.18 5.0 1.06 0.26 (.17) 4.2
>> Period3 -0.07 0.23 -0.3 -0.11 0.34 (.23) -0.3
>> Period4 -0.20 0.25 -0.8 -0.24 0.36 (.24) -0.7
>> The SEs (really SEDs) are not much increased from the quasibinomial model.
>> The estimates of treatment eﬀects (diﬀerences from the overall mean) are
>> substantially reduced (pulled in towards the overall mean). The net eﬀect is
>> that the z -statistic is smaller for the glmer model than the t for the
>> quasibinomial model.
>> John Maindonald email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>> On 12/12/2008, at 7:52 AM, Andrew Robinson wrote:
>>> Echoing Murray's points here - nicely put, Murray - it seems to me
>>> that the quasi-likelihood and the GLMM are different approaches to the
>>> same problem.
>>> Can anyone provide a substantial example where random effects and
>>> quasilikelihood have both been necessary?
>>> Best wishes,
>>> On Fri, Dec 12, 2008 at 09:11:39AM +1300, Murray Jorgensen wrote:
>>>> The following is how I think about this at the moment:
>>>> The quasi-likelihood approach is an attempt at a model-free approach to
>>>> the problem of overdispersion in non-Gaussian regression situations
>>>> where standard distributional assumptions fail to provide the observed
>>>> mean-variance relationship.
>>>> The glmm approach, on the other hand, does not abandon models and
>>>> likelihood but seeks to account for the observed mean-variance
>>>> relationship by adding unobserved latent variables (random effects) to
>>>> the model.
>>>> Seeking to combine the two approaches by using both quasilikelihood
>>>> *and* random effects would seem to be asking for trouble as being able
>>>> to use two tools on one problem would give a lot of flexibility to the
>>>> parameter estimation; probably leading to a very flat quasilikelihood
>>>> surface and ill-determined optima.
>>>> But all of the above is only thoughts without the benefit of either
>>>> serious attempts at fitting real data or doing serious theory so I will
>>>> defer to anyone who has done either!
>>>> Philosophically, at least, there seems to be clash between the two
>>>> approaches and I doubt that attempts to combine them will be successful.
>>>> Murray Jorgensen
>>> Andrew Robinson
>>> Department of Mathematics and Statistics Tel: +61-3-8344-6410
>>> University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
>>> R-sig-mixed-models at r-project.org mailing list
>> R-sig-mixed-models at r-project.org mailing list
> R-sig-mixed-models at r-project.org mailing list
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz majorgensen at ihug.co.nz Fax 7 838 4155
Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 139 5862
More information about the R-sig-mixed-models