[R-sig-ME] single argument anova for GLMMs (really, glmer, or dispersion?)

Sat Dec 13 12:53:47 CET 2008

On Sat, Dec 13, 2008 at 08:08:11PM +1100, John Maindonald wrote:
> I certainly prefer fully specified models.  I will end up pretty much  
> conceding Doug's point, but maybe not quite!
> 
> Equally, I am uneasy with glmer's restriction to models where the  
> error family variance can only be modified by addition on the scale of  
> the linear predictor.

Yes, me too.

> There's an issue of what is a "fully specified model".  Sure one can  
> sample from a negative binomial, but when one gets down to what  
> processes might generate a negative binomial, there are several  
> alternative mechanisms.  Rather than using a negative binomial as a  
> way of modeling overdispersion, I'd prefer to be modeling the process  
> at a more fundamental level -  maybe from applying a gamma mixing  
> distribution to a Poisson rate.  That way, one can think about whether  
> a gamma mixing distribution makes sense, whether something else might  
> be more appropriate.  Direct entry into use of a negative binomial  
> avoids exposure to such doubts.

I think that this is fair enough and well put, John, but I'm going to
push back in the other direction with a hypothetical example.  Let's
say that you have your over-dispersed count data.  What do you lose if
you simply take some convenient and credible transformation of the
response variable and then use lme, paying close attention to your
conditional distribution plots?  

Let me pin that down a little: would you be reluctant to follow that
approach under any conditions?  If so, then under what conditions
would you be reluctant to follow that approach, and why?

(My experience has been that the judicious application of variance
models corrects for most of the variability problems that I have
encountered so far in practical work.)

> I like to think of a quasibinomial with a dispersion equal to three as  
> generated by a sequence of Bernoulli trials in which each event  
> generates 3 repeats of itself.  This does not quite work (or I do not  
> know how it works) if each event has to generate 2.5 repeats.

2 half the time and 3 half the time?

> It would I think be in the spirit of the direction in which lmer is  
> moving, albeit for heterogeneous variances where the error is normal,  
> to allow modeling at the level (a gamma mixture of Poissons) I have in  
> mind.  Would this be enormously more computationally intensive than  
> modeling the negative binomial or other such distributions?  I note,  
> though, your comments Doug that suggest that this kind of thing is  
> scarcely tractable.

On a side note, I believe that Lee and Nelder's approach to
mixed-effects modelling (as realized in Genstat) allows random effects
to come from certain non-normal distributions, including for example
the binomial-beta and the poisson-gamma combinations.  See Table 6.2
of their book.  The underlying theory still has some gaps in it, I
think. 

Greatly enjoying this conversation,

Andrew

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/