[R-sig-ME] single argument anova for GLMMs (really, glmer, or dispersion?)

Fri Dec 12 06:29:19 CET 2008

The approaches have the fundamental difference that the overdispersion  
model multiplies the theoretical variance by an amount that is  
constant (whether on the scale of the response [the binomial variance  
becomes \phi n p(1-p)], or on the scale of the linear predictor).

I have called overdispersion a model - actually it is not one model,  
but a range of possible models. I have no problem, in principle, with  
one fitting method that reflects multiple possible models once one  
gets down to detail.

GLMMs add to the theoretical variance, on the scale of the linear  
predictor. For binomial models with the usual link functions (logit,  
probit, cloglog), the scale spreads out close to p=0 or close to  
p=1,   With the glmm models the variances then increase more,  
relatively to the overdispersion model, at the  extremes of the  
scale.   (For the Poisson with a log link, there is just one relevant  
extreme, at 0.)

NB also, all variance assessments are conditional on getting the link  
right.  If the link is wrong in a way that matters, there will be  
apparent increases in variance in some parts of the scale that reflect  
biases that arise from the inappropriate choice of link.

There may be cases where overdispersion gives too small a variance  
(relatively) at the extremes, while glmer gives too high a variance.   
As there are an infinite number of possible ways in which the variance  
might vary with (in the binomial case) p, it would be surprising if  
(detectable with enough data, or enough historical experience), there  
were not such "intermediate" cases.

There might in principle be subplot designs, with a treatment at the  
subplot level, where the overdispersion model is required at the  
subplot level in order to get the treatment comparisons correct at  
that level.

As much of this discussion is focused around ecology, experience with  
fitting one or other model to large datasets is surely required that  
will help decide just how, in one or other practical context, 1) the  
variance is likely to change with p (or in the Poisson case, with the  
Poisson mean) and 2) what links seem preferable.

The best way to give the flexibility required for modeling the  
variance, as it seems to me, would be the ability to make the variance  
of p a fairly arbitrary function of p, with other variance components  
added on the scale of the linear predictor.  More radically, all  
variance components might be functions of p.  I am not sure that going  
that far would be a good idea - there'd be too many complaints that  
model fits will not converge!

The following shows a comparison that I did recently for a talk.  The  
p's are not sufficiently extreme to show much difference between the  
two models:

The dataset cbpp is from the lme4 package.
infect <- with(cbpp, cbind(incidence, size - incidence))
(gm1 <- glmer(infect ~ period + (1 | herd),
family = binomial, data = cbpp))
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.412 0.642
Number of obs: 56, groups: herd, 15
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.399 0.228 -6.14 8.4e-10
period2 -0.992 0.305 -3.25 0.00116
period3 -1.129 0.326 -3.46 0.00054
period4 -1.580 0.429 -3.69 0.00023

Here, use the “sum” contrasts, and compare with the overall mean.
                             glmer                quasibinomial
                     Est  SE          z     Est SE (binomial SE)    t
(Intercept) -2.32 0.22 -10.5   -2.33 0.21 (.14)             -11.3
Period1     -0.66 0.32   -2.1   -0.72 0.45 (.31)               -1.6
Period2      0.93 0.18    5.0     1.06 0.26 (.17)                4.2
Period3     -0.07 0.23  -0.3    -0.11 0.34 (.23)               -0.3
Period4     -0.20 0.25  -0.8    -0.24 0.36 (.24)               -0.7

The SEs (really SEDs) are not much increased from the quasibinomial  
model. The estimates of treatment eﬀects (diﬀerences from the  
overall mean) are substantially reduced (pulled in towards the overall  
mean). The net eﬀect is that the z -statistic is smaller for the  
glmer model than the t for the quasibinomial model.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 12/12/2008, at 7:52 AM, Andrew Robinson wrote:

> Echoing Murray's points here - nicely put, Murray - it seems to me
> that the quasi-likelihood and the GLMM are different approaches to the
> same problem.
>
> Can anyone provide a substantial example where random effects and
> quasilikelihood have both been necessary?
>
> Best wishes,
>
> Andrew
>
>
> On Fri, Dec 12, 2008 at 09:11:39AM +1300, Murray Jorgensen wrote:
>> The following is how I think about this at the moment:
>>
>> The quasi-likelihood approach is an attempt at a model-free  
>> approach to
>> the problem of overdispersion in non-Gaussian regression situations
>> where standard distributional assumptions fail to provide the  
>> observed
>> mean-variance relationship.
>>
>> The glmm approach, on the other hand, does not abandon models and
>> likelihood but seeks to account for the observed mean-variance
>> relationship by adding unobserved latent variables (random effects)  
>> to
>> the model.
>>
>> Seeking to combine the two approaches by using both quasilikelihood
>> *and* random effects would seem to be asking for trouble as being  
>> able
>> to use two tools on one problem would give a lot of flexibility to  
>> the
>> parameter estimation; probably leading to a very flat quasilikelihood
>> surface and ill-determined optima.
>>
>> But all of the above is only thoughts without the benefit of either
>> serious attempts at fitting real data or doing serious theory so I  
>> will
>> defer to anyone who has done either!
>>
>> Philosophically, at least, there seems to be clash between the two
>> approaches and I doubt that attempts to combine them will be  
>> successful.
>>
>> Murray Jorgensen
>>
>>
>
> -- 
> Andrew Robinson
> Department of Mathematics and Statistics            Tel:  
> +61-3-8344-6410
> University of Melbourne, VIC 3010 Australia         Fax:  
> +61-3-8344-4599
> http://www.ms.unimelb.edu.au/~andrewpr
> http://blogs.mbs.edu/fishing-in-the-bay/
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models