[R-sig-ME] single argument anova for GLMMs (really, glmer, or dispersion?)

Sat Dec 13 10:08:11 CET 2008

I certainly prefer fully specified models.  I will end up pretty much  
conceding Doug's point, but maybe not quite!

Equally, I am uneasy with glmer's restriction to models where the  
error family variance can only be modified by addition on the scale of  
the linear predictor.

There's an issue of what is a "fully specified model".  Sure one can  
sample from a negative binomial, but when one gets down to what  
processes might generate a negative binomial, there are several  
alternative mechanisms.  Rather than using a negative binomial as a  
way of modeling overdispersion, I'd prefer to be modeling the process  
at a more fundamental level -  maybe from applying a gamma mixing  
distribution to a Poisson rate.  That way, one can think about whether  
a gamma mixing distribution makes sense, whether something else might  
be more appropriate.  Direct entry into use of a negative binomial  
avoids exposure to such doubts.

I like to think of a quasibinomial with a dispersion equal to three as  
generated by a sequence of Bernoulli trials in which each event  
generates 3 repeats of itself.  This does not quite work (or I do not  
know how it works) if each event has to generate 2.5 repeats.

[I am reminded of Swedish mathematical educationalist Andrejs Dunkel's  
quip: "1.8 children per family! How can that be?" , with the reply  
"Statistics, my friends, statistics Seems to promote split  
personalities") Sadly, Andrejs is deceased]

It would I think be in the spirit of the direction in which lmer is  
moving, albeit for heterogeneous variances where the error is normal,  
to allow modeling at the level (a gamma mixture of Poissons) I have in  
mind.  Would this be enormously more computationally intensive than  
modeling the negative binomial or other such distributions?  I note,  
though, your comments Doug that suggest that this kind of thing is  
scarcely tractable.

I do think a lot more work is needed that looks at what seems to  
happen, in one or other type of application, with multiple real  
datasets from that area.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 13/12/2008, at 6:51 PM, Murray Jorgensen wrote:

> I should have replied earlier to the kind comments Andrew and Doug  
> made earlier on my contrast between quasilikelihood 'models' and  
> GLMMs, however I am being invaded by sons, daughters in and out of  
> law and a grandson.
>
> I tried to be neutral in this comparison but I will 'come out' here  
> and admit that my preference is for fully-explicit probability  
> models for the data. It seems to me that without this you don't have  
> a clear meaning for the parameters that you are estimating. John  
> says "I have no problem, in principle, with one fitting method that  
> reflects multiple possible models once one gets down to detail."  
> This may be psychological  but I would have to say that I *do* have  
> a problem with that.
>
> Because data can be generated from fully-specified models it seems  
> to me that there is scope for a computer experiment comparing the  
> results of both fitting methods to data generated from a known  
> model. Just because the data would be generated from a known GLMM it  
> need not mean that the GLMM estimates from fitting the correct model  
> would be superior. I imagine that quasi approaches might do well in  
> small samples.
>
> Clearly the design of such a computer experiment would need to be  
> carefully thought out. It would probably better to design around a  
> particular real situation than to try for great generality.
>
>
>
> Douglas Bates wrote:
>> I appreciate your thoughts on this, John, and especially the example.
>> There is a tendency in discussions of the theory to ignore the
>> practical applications and in discussion of the applications to  
>> ignore
>> the theoretical foundations.  Effective statistical analysis should,
>> of course, involve both.
>> I include some comments below.
>> On Thu, Dec 11, 2008 at 11:29 PM, John Maindonald
>> <john.maindonald at anu.edu.au> wrote:
>>> The approaches have the fundamental difference that the  
>>> overdispersion model
>>> multiplies the theoretical variance by an amount that is constant  
>>> (whether
>>> on the scale of the response [the binomial variance becomes \phi n  
>>> p(1-p)],
>>> or on the scale of the linear predictor).
>> This highlights the problem that I have with the quasi families.  One
>> cannot choose the mean and variance separately in distributions like
>> the Bernoulli or Poisson.  (As the binomial is a sum of independent
>> Bernoulli distributions I will phrase this in terms of the Bernoulli
>> so I don't trip up on the n's.)  For a Gaussian distribution one has
>> the luxury of separately setting the mean and the variance.  For the
>> Bernoulli or Poisson you don't.  The choice of the mean completely
>> determines the distribution.  There is no such thing as a Bernoulli
>> distribution with variance \phi p(1-p) unless \phi is identically 1.
>> If you want to discuss generalizations of the Bernoulli or Poisson
>> distributions I could accept that but simply saying that we will base
>> a model on a distribution that is just like the Bernoulli except that
>> it has a tunable variance won't fly.  If we want to attribute
>> desirable statistical properties to estimates by saying, for example,
>> that they are maximum likelihood estimates or approximations to the
>> maximum likelihood estimates we must be able to define the likelihood
>> and to me that means that you must have a probability model.   That
>> is, you must be able to describe the probability distribution of the
>> response or, in the case of the generalized linear mixed models, the
>> conditional distribution of the response given the random effects.
>> If I can't define the probability distribution then how can I  
>> evaluate
>> the likelihood?  One can argue by analogy that the estimates based on
>> the Bernoulli have certain properties so we think that the estimates
>> from a hypothetical distribution that is just like the Bernoulli
>> except that it is overdispersed should have similar properties with  
>> an
>> extra parameter thrown in but that does not constitute a "sound
>> theoretical basis" and there is no justification for thinking that
>> such a procedure will allow for extensions to random effects.
>>> I have called overdispersion a model - actually it is not one  
>>> model, but a
>>> range of possible models. I have no problem, in principle, with  
>>> one fitting
>>> method that reflects multiple possible models once one gets down  
>>> to detail.
>>> GLMMs add to the theoretical variance, on the scale of the linear  
>>> predictor.
>>> For binomial models with the usual link functions (logit, probit,  
>>> cloglog),
>>> the scale spreads out close to p=0 or close to p=1,   With the  
>>> glmm models
>>> the variances then increase more, relatively to the overdispersion  
>>> model, at
>>> the  extremes of the scale.   (For the Poisson with a log link,  
>>> there is
>>> just one relevant extreme, at 0.)
>> Although you are not saying it explicitly it seems that you are
>> regarding the overdispersion and random effects approaches as a tool
>> for modeling the marginal variance of the responses.  It happens that
>> I don't think of the models that way, perhaps because I have
>> difficulty thinking of the marginal variance.  I'm enough of a
>> mathematician that I like to consider the simple cases which to me
>> means the unconditional distribution of the random effects and the
>> conditional distribution of the response.  In the way that I think of
>> the models those are both relatively simple distributions.
>> One of the things that trip people up when considering random effects
>> as ways of modifying the marginal variance structure of the response
>> is that techniques or modifications that will work for the Gaussian
>> distribution don't work for other distributions, because of the fact
>> that the mean and variance are separately specified in the Gaussian
>> but not in others.  In the nlme package Jose and I allowed for
>> correlation structures and variance functions for the response to be
>> specified separately from the random effects structure.  This is
>> legitimate in that we were only allowing models for which the
>> conditional distribution of Y given B = b is Gaussian.  Extending  
>> this
>> type of modeling to other families of conditional distribution is
>> conceptually very difficult.  Many people want to do this but, as  
>> that
>> noted philosopher Mick Jagger pointed out, "You can't always get what
>> you want.".  If you want to model binary or count data you have to
>> accept that the mean and the variance of the conditional distribution
>> cannot be modeled separately.  (Another noted philosopher, Ernie of
>> Sesame Street, pointed out that "You gotta put down the ducky if you
>> want to play the saxophone.")
>>> NB also, all variance assessments are conditional on getting the  
>>> link right.
>>> If the link is wrong in a way that matters, there will be apparent
>>> increases in variance in some parts of the scale that reflect  
>>> biases that
>>> arise from the inappropriate choice of link.
>>>
>>> There may be cases where overdispersion gives too small a variance
>>> (relatively) at the extremes, while glmer gives too high a  
>>> variance.  As
>>> there are an infinite number of possible ways in which the  
>>> variance might
>>> vary with (in the binomial case) p, it would be surprising if  
>>> (detectable
>>> with enough data, or enough historical experience), there were not  
>>> such
>>> "intermediate" cases.
>>>
>>> There might in principle be subplot designs, with a treatment at  
>>> the subplot
>>> level, where the overdispersion model is required at the subplot  
>>> level in
>>> order to get the treatment comparisons correct at that level.
>>>
>>> As much of this discussion is focused around ecology, experience  
>>> with
>>> fitting one or other model to large datasets is surely required  
>>> that will
>>> help decide just how, in one or other practical context, 1) the  
>>> variance is
>>> likely to change with p (or in the Poisson case, with the Poisson  
>>> mean) and
>>> 2) what links seem preferable.
>>>
>>> The best way to give the flexibility required for modeling the  
>>> variance, as
>>> it seems to me, would be the ability to make the variance of p a  
>>> fairly
>>> arbitrary function of p, with other variance components added on  
>>> the scale
>>> of the linear predictor.  More radically, all variance components  
>>> might be
>>> functions of p.  I am not sure that going that far would be a good  
>>> idea -
>>> there'd be too many complaints that model fits will not converge!
>>>
>>> The following shows a comparison that I did recently for a talk.   
>>> The p's
>>> are not sufficiently extreme to show much difference between the  
>>> two models:
>>>
>>> The dataset cbpp is from the lme4 package.
>>> infect <- with(cbpp, cbind(incidence, size - incidence))
>>> (gm1 <- glmer(infect ~ period + (1 | herd),
>>> family = binomial, data = cbpp))
>>> Random effects:
>>> Groups Name Variance Std.Dev.
>>> herd (Intercept) 0.412 0.642
>>> Number of obs: 56, groups: herd, 15
>>> Fixed effects:
>>> Estimate Std. Error z value Pr(>|z|)
>>> (Intercept) -1.399 0.228 -6.14 8.4e-10
>>> period2 -0.992 0.305 -3.25 0.00116
>>> period3 -1.129 0.326 -3.46 0.00054
>>> period4 -1.580 0.429 -3.69 0.00023
>>>
>>> Here, use the "sum" contrasts, and compare with the overall mean.
>>>                           glmer                quasibinomial
>>>                   Est  SE          z     Est SE (binomial SE)    t
>>> (Intercept) -2.32 0.22 -10.5   -2.33 0.21 (.14)             -11.3
>>> Period1     -0.66 0.32   -2.1   -0.72 0.45 (.31)               -1.6
>>> Period2      0.93 0.18    5.0     1.06 0.26 (.17)                4.2
>>> Period3     -0.07 0.23  -0.3    -0.11 0.34 (.23)               -0.3
>>> Period4     -0.20 0.25  -0.8    -0.24 0.36 (.24)               -0.7
>>>
>>> The SEs (really SEDs) are not much increased from the  
>>> quasibinomial model.
>>> The estimates of treatment eﬀects (diﬀerences from the overall  
>>> mean) are
>>> substantially reduced (pulled in towards the overall mean). The  
>>> net eﬀect is
>>> that the z -statistic is smaller for the glmer model than the t  
>>> for the
>>> quasibinomial model.
>>>
>>> John Maindonald             email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>>
>>>
>>> On 12/12/2008, at 7:52 AM, Andrew Robinson wrote:
>>>
>>>> Echoing Murray's points here - nicely put, Murray - it seems to me
>>>> that the quasi-likelihood and the GLMM are different approaches  
>>>> to the
>>>> same problem.
>>>>
>>>> Can anyone provide a substantial example where random effects and
>>>> quasilikelihood have both been necessary?
>>>>
>>>> Best wishes,
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On Fri, Dec 12, 2008 at 09:11:39AM +1300, Murray Jorgensen wrote:
>>>>> The following is how I think about this at the moment:
>>>>>
>>>>> The quasi-likelihood approach is an attempt at a model-free  
>>>>> approach to
>>>>> the problem of overdispersion in non-Gaussian regression  
>>>>> situations
>>>>> where standard distributional assumptions fail to provide the  
>>>>> observed
>>>>> mean-variance relationship.
>>>>>
>>>>> The glmm approach, on the other hand, does not abandon models and
>>>>> likelihood but seeks to account for the observed mean-variance
>>>>> relationship by adding unobserved latent variables (random  
>>>>> effects) to
>>>>> the model.
>>>>>
>>>>> Seeking to combine the two approaches by using both  
>>>>> quasilikelihood
>>>>> *and* random effects would seem to be asking for trouble as  
>>>>> being able
>>>>> to use two tools on one problem would give a lot of flexibility  
>>>>> to the
>>>>> parameter estimation; probably leading to a very flat  
>>>>> quasilikelihood
>>>>> surface and ill-determined optima.
>>>>>
>>>>> But all of the above is only thoughts without the benefit of  
>>>>> either
>>>>> serious attempts at fitting real data or doing serious theory so  
>>>>> I will
>>>>> defer to anyone who has done either!
>>>>>
>>>>> Philosophically, at least, there seems to be clash between the two
>>>>> approaches and I doubt that attempts to combine them will be  
>>>>> successful.
>>>>>
>>>>> Murray Jorgensen
>>>>>
>>>>>
>>>> --
>>>> Andrew Robinson
>>>> Department of Mathematics and Statistics            Tel:  
>>>> +61-3-8344-6410
>>>> University of Melbourne, VIC 3010 Australia         Fax:  
>>>> +61-3-8344-4599
>>>> http://www.ms.unimelb.edu.au/~andrewpr
>>>> http://blogs.mbs.edu/fishing-in-the-bay/
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> -- 
> Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
> Department of Statistics, University of Waikato, Hamilton, New Zealand
> Email: maj at waikato.ac.nz    majorgensen at ihug.co.nz      Fax 7 838 4155
> Phone  +64 7 838 4773 wk    Home +64 7 825 0441    Mobile 021 139 5862