[R-sig-ME] lme4, lme4a, and overdispersed distributions (again)

Thu Jun 24 23:17:25 CEST 2010

On Thu, Jun 24, 2010 at 12:59 PM, Jeffrey Evans
<Jeffrey.Evans at dartmouth.edu> wrote:
> Since I am definitely *not* a mathematician, I am straying in over my head
> here.
>
> I understand what you are saying - that there isn't a likelihood function
> for the quasi-binomial "distribution". And therefore, there is no-such
> distribution.
>
> What do you think of the suggestion that a beta-binomial mixture
> distribution could be used to model overdispersed binomial data?
>
> Would this be a techinically correct and logistically feasibile solution?

Technically correct, I imagine.  The feasibility would depend on
whether there is an IRLS type of algorithm for determining the
estimates of the coefficients in the linear predictor.

I see several packages mention the beta-binomial distribution and
fitting such models.  Without actually examining the code though I
wouldn't be able to guess how easily they could be adapted.  You have
to realize that it may be necessary to fit and re-fit the penalized,
generalized linear model many many times, perhaps into the thousands
of times, during the fit of one GLMM. You can't be casual about the
inner loop of fitting the GLM.  Most of the descriptions in packages
that I have seen eventually call optim.  The implementation of GLMs
within glmer is very different from calling an optimizer.

> -----Original Message-----
> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
> Bates
> Sent: Thursday, June 24, 2010 1:25 PM
> To: Jeffrey Evans
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions (again)
>
> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
> <Jeffrey.Evans at dartmouth.edu> wrote:
>> Like others, I have experienced trouble with estimation of the scale
>> parameter using the quasi-distributions in lme4, which is necessary to
>> calculate QAICc and rank overdispersed generalized linear mixed models.
>
>> I had several exchanges with Ben Bolker about this early last year
>> after his TREE paper came out
>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
>> %29000 19-6), and I know it's been discussed on on this list. Has
>> there been or is there any potential resolution to this forthcoming in
>> future releases of
>> lme4 or lme4a? I run into overdispersed binomial distributions
>> frequently and have had to use SAS to deal with them. SAS appears to
>> work, but it won't estimate the overdispersion parameter using laplace
>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods
>> can't be used for model ranking. I don't know why SAS can't/won't, but
>> lme4 will run these quasi-binomial and quasi-poisson distributions with
> Laplace estimation.
>
>> Is there a workable way to use lme4 for modeling overdispersed
>> binomial data?
>
> I have trouble discussing this because I come from a background as a
> mathematician and am used to tracing derivations back to the original
> definitions.  So when I think of a likelihood (or, equivalently, a
> deviance) to be optimized it only makes sense to me if there is a
> probability distribution associated with the model.  And for the
> quasi-binomial and quasi-Poisson families, there isn't a probability
> distribution.  To me that means that discussing maximum likelihood
> estimators for such models is nonsense.  The models simply do not exist.
> One can play tricks in the case of a generalized linear model to estimate a
> "quasi-parameter" that isn't part of the probability distribution but it is
> foolhardy to expect that the tricks will automatically carry over to a
> generalized linear mixed model.
>
> I am not denying that data that are over-dispersed with respect to the
> binomial or Poisson distributions can and do occur.  But having data like
> this and a desire to model it doesn't make the quasi families real.  In his
> signature Thierry Onkelinx quotes
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> I could and do plan to incorporate the negative binomial family but, without
> a definition that I can understand of a quasi-binomial or quasi-Poisson
> distribution and its associated probability function, I'm stuck. To me it's
> a "build bricks without straw" situation - you can't find maximum likelihood
> estimates for parameters that aren't part of the likelihood.
>
>