[R-sig-ME] lme4, lme4a, and overdispersed distributions (again)

Thu Jun 24 19:25:01 CEST 2010

On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
<Jeffrey.Evans at dartmouth.edu> wrote:
> Like others, I have experienced trouble with estimation of the scale
> parameter using the quasi-distributions in lme4, which is necessary to
> calculate QAICc and rank overdispersed generalized linear mixed models.

> I had several exchanges with Ben Bolker about this early last year after his
> TREE paper came out
> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809%29000
> 19-6), and I know it's been discussed on on this list. Has there been or is
> there any potential resolution to this forthcoming in future releases of
> lme4 or lme4a? I run into overdispersed binomial distributions frequently
> and have had to use SAS to deal with them. SAS appears to work, but it won't
> estimate the overdispersion parameter using laplace estimation (only PQL),
> As I understand it, these pseudo-Iikelihoods can't be used for model
> ranking. I don't know why SAS can't/won't, but lme4 will run these
> quasi-binomial and quasi-poisson distributions with Laplace estimation.

> Is there a workable way to use lme4 for modeling overdispersed binomial
> data?

I have trouble discussing this because I come from a background as a
mathematician and am used to tracing derivations back to the original
definitions.  So when I think of a likelihood (or, equivalently, a
deviance) to be optimized it only makes sense to me if there is a
probability distribution associated with the model.  And for the
quasi-binomial and quasi-Poisson families, there isn't a probability
distribution.  To me that means that discussing maximum likelihood
estimators for such models is nonsense.  The models simply do not
exist.  One can play tricks in the case of a generalized linear model
to estimate a "quasi-parameter" that isn't part of the probability
distribution but it is foolhardy to expect that the tricks will
automatically carry over to a generalized linear mixed model.

I am not denying that data that are over-dispersed with respect to the
binomial or Poisson distributions can and do occur.  But having data
like this and a desire to model it doesn't make the quasi families
real.  In his signature Thierry Onkelinx quotes

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

I could and do plan to incorporate the negative binomial family but,
without a definition that I can understand of a quasi-binomial or
quasi-Poisson distribution and its associated probability function,
I'm stuck. To me it's a "build bricks without straw" situation - you
can't find maximum likelihood estimates for parameters that aren't
part of the likelihood.