[R-sig-ME] lme4, lme4a, and overdispersed distributions (again)

Thu Jun 24 19:59:58 CEST 2010

Since I am definitely *not* a mathematician, I am straying in over my head
here. 

I understand what you are saying - that there isn't a likelihood function
for the quasi-binomial "distribution". And therefore, there is no-such
distribution.

What do you think of the suggestion that a beta-binomial mixture
distribution could be used to model overdispersed binomial data? 

Would this be a techinically correct and logistically feasibile solution?

-jeff

-----Original Message-----
From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
Bates
Sent: Thursday, June 24, 2010 1:25 PM
To: Jeffrey Evans
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions (again)

On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
<Jeffrey.Evans at dartmouth.edu> wrote:
> Like others, I have experienced trouble with estimation of the scale 
> parameter using the quasi-distributions in lme4, which is necessary to 
> calculate QAICc and rank overdispersed generalized linear mixed models.

> I had several exchanges with Ben Bolker about this early last year 
> after his TREE paper came out 
> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
> %29000 19-6), and I know it's been discussed on on this list. Has 
> there been or is there any potential resolution to this forthcoming in 
> future releases of
> lme4 or lme4a? I run into overdispersed binomial distributions 
> frequently and have had to use SAS to deal with them. SAS appears to 
> work, but it won't estimate the overdispersion parameter using laplace 
> estimation (only PQL), As I understand it, these pseudo-Iikelihoods 
> can't be used for model ranking. I don't know why SAS can't/won't, but 
> lme4 will run these quasi-binomial and quasi-poisson distributions with
Laplace estimation.

> Is there a workable way to use lme4 for modeling overdispersed 
> binomial data?

I have trouble discussing this because I come from a background as a
mathematician and am used to tracing derivations back to the original
definitions.  So when I think of a likelihood (or, equivalently, a
deviance) to be optimized it only makes sense to me if there is a
probability distribution associated with the model.  And for the
quasi-binomial and quasi-Poisson families, there isn't a probability
distribution.  To me that means that discussing maximum likelihood
estimators for such models is nonsense.  The models simply do not exist.
One can play tricks in the case of a generalized linear model to estimate a
"quasi-parameter" that isn't part of the probability distribution but it is
foolhardy to expect that the tricks will automatically carry over to a
generalized linear mixed model.

I am not denying that data that are over-dispersed with respect to the
binomial or Poisson distributions can and do occur.  But having data like
this and a desire to model it doesn't make the quasi families real.  In his
signature Thierry Onkelinx quotes

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

I could and do plan to incorporate the negative binomial family but, without
a definition that I can understand of a quasi-binomial or quasi-Poisson
distribution and its associated probability function, I'm stuck. To me it's
a "build bricks without straw" situation - you can't find maximum likelihood
estimates for parameters that aren't part of the likelihood.