[R-sig-ME] Fwd: lme4, lme4a, and overdispersed distributions (again)

Fri Jun 25 01:11:56 CEST 2010

I think it more accurate to say that, in general, there may be 
a class of distributions, and therefore a possible multiplicity 
of likelihoods, not necessarily for distributions of exponential 
form.  This is a PhD thesis asking to be done, or maybe 
someone has already done it.

Over-dispersed distributions, where it is entirely clear what the
distribution is, can be generated as GLM model +  one random
effect per observation.  We have discussed this before.  This
seems to me the preferred way to go, if such a model seems to
fit the data.  I've not checked the current state of play re fitting
such models in lme4 of lme4a; in the past some versions have
allowed such a model.

I like the simplicity of the one random effect per observation 
approach, as against what can seem the convoluted theoretical 
framework in which beta binomials live.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

> 
> On 25/06/2010, at 3:59 AM, Jeffrey Evans wrote:
> 
>> Since I am definitely *not* a mathematician, I am straying in over my head
>> here. 
>> 
>> I understand what you are saying - that there isn't a likelihood function
>> for the quasi-binomial "distribution". And therefore, there is no-such
>> distribution.
>> 
>> What do you think of the suggestion that a beta-binomial mixture
>> distribution could be used to model overdispersed binomial data? 
>> 
>> Would this be a techinically correct and logistically feasibile solution?
>> 
>> -jeff
>> 
>> -----Original Message-----
>> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
>> Bates
>> Sent: Thursday, June 24, 2010 1:25 PM
>> To: Jeffrey Evans
>> Cc: r-sig-mixed-models at r-project.org
>> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions (again)
>> 
>> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
>> <Jeffrey.Evans at dartmouth.edu> wrote:
>>> Like others, I have experienced trouble with estimation of the scale 
>>> parameter using the quasi-distributions in lme4, which is necessary to 
>>> calculate QAICc and rank overdispersed generalized linear mixed models.
>> 
>>> I had several exchanges with Ben Bolker about this early last year 
>>> after his TREE paper came out 
>>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
>>> %29000 19-6), and I know it's been discussed on on this list. Has 
>>> there been or is there any potential resolution to this forthcoming in 
>>> future releases of
>>> lme4 or lme4a? I run into overdispersed binomial distributions 
>>> frequently and have had to use SAS to deal with them. SAS appears to 
>>> work, but it won't estimate the overdispersion parameter using laplace 
>>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods 
>>> can't be used for model ranking. I don't know why SAS can't/won't, but 
>>> lme4 will run these quasi-binomial and quasi-poisson distributions with
>> Laplace estimation.
>> 
>>> Is there a workable way to use lme4 for modeling overdispersed 
>>> binomial data?
>> 
>> I have trouble discussing this because I come from a background as a
>> mathematician and am used to tracing derivations back to the original
>> definitions.  So when I think of a likelihood (or, equivalently, a
>> deviance) to be optimized it only makes sense to me if there is a
>> probability distribution associated with the model.  And for the
>> quasi-binomial and quasi-Poisson families, there isn't a probability
>> distribution.  To me that means that discussing maximum likelihood
>> estimators for such models is nonsense.  The models simply do not exist.
>> One can play tricks in the case of a generalized linear model to estimate a
>> "quasi-parameter" that isn't part of the probability distribution but it is
>> foolhardy to expect that the tricks will automatically carry over to a
>> generalized linear mixed model.
>> 
>> I am not denying that data that are over-dispersed with respect to the
>> binomial or Poisson distributions can and do occur.  But having data like
>> this and a desire to model it doesn't make the quasi families real.  In his
>> signature Thierry Onkelinx quotes
>> 
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>> 
>> I could and do plan to incorporate the negative binomial family but, without
>> a definition that I can understand of a quasi-binomial or quasi-Poisson
>> distribution and its associated probability function, I'm stuck. To me it's
>> a "build bricks without straw" situation - you can't find maximum likelihood
>> estimates for parameters that aren't part of the likelihood.
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm