[R-sig-ME] overdispersion and the one random effect per observation approach

Sat Jun 26 08:15:22 CEST 2010

Dear All,

it has been recently discussed on this list (e.g. see below, as well as 
http://glmm.wikidot.com/faq) that overdispersed distributions can be 
modelled by using an observation-level random effect (i.e. one random effect 
per observation). I am wondering if anyone knows a good reference for this 
approach. John Maindonald kindly pointed me to an example in the new edition 
of his book:

> There is an example in Section 10.5 of the 3rd edition of Data Analysis & 
> Graphics Using R, which is just now out.

Does anyone know other refs? Thanks in advance for your help!

Cheers,

Luca

-------------------
Luca Börger, PhD
Postdoctoral Research Fellow
Department of Integrative Biology
University of Guelph
Guelph, Ontario, Canada N1G 2W1

office +1 519 824 4120 ext. 52975
lab     +1 519 824 4120 ext. 53594
fax:     +1 519 767 1656

email: lborger at uoguelph.ca
www.researcherid.com/rid/C-6003-2008
http://uoguelph.academia.edu/LucaBorger
--------------------------------------------------------------------

> ----- Original Message ----- From: "John Maindonald" 
> <john.maindonald at anu.edu.au>
> To: <r-sig-mixed-models at r-project.org>
> Sent: Thursday, June 24, 2010 7:11 PM
> Subject: [R-sig-ME] Fwd: lme4, lme4a,and overdispersed distributions 
> (again)
>
>
>> I think it more accurate to say that, in general, there may be
>> a class of distributions, and therefore a possible multiplicity
>> of likelihoods, not necessarily for distributions of exponential
>> form.  This is a PhD thesis asking to be done, or maybe
>> someone has already done it.
>>
>> Over-dispersed distributions, where it is entirely clear what the
>> distribution is, can be generated as GLM model +  one random
>> effect per observation.  We have discussed this before.  This
>> seems to me the preferred way to go, if such a model seems to
>> fit the data.  I've not checked the current state of play re fitting
>> such models in lme4 of lme4a; in the past some versions have
>> allowed such a model.
>>
>> I like the simplicity of the one random effect per observation
>> approach, as against what can seem the convoluted theoretical
>> framework in which beta binomials live.
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>> http://www.maths.anu.edu.au/~johnm
>>
>>>
>>> On 25/06/2010, at 3:59 AM, Jeffrey Evans wrote:
>>>
>>>> Since I am definitely *not* a mathematician, I am straying in over my 
>>>> head
>>>> here.
>>>>
>>>> I understand what you are saying - that there isn't a likelihood 
>>>> function
>>>> for the quasi-binomial "distribution". And therefore, there is no-such
>>>> distribution.
>>>>
>>>> What do you think of the suggestion that a beta-binomial mixture
>>>> distribution could be used to model overdispersed binomial data?
>>>>
>>>> Would this be a techinically correct and logistically feasibile 
>>>> solution?
>>>>
>>>> -jeff
>>>>
>>>> -----Original Message-----
>>>> From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
>>>> Bates
>>>> Sent: Thursday, June 24, 2010 1:25 PM
>>>> To: Jeffrey Evans
>>>> Cc: r-sig-mixed-models at r-project.org
>>>> Subject: Re: [R-sig-ME] lme4, lme4a, and overdispersed distributions 
>>>> (again)
>>>>
>>>> On Thu, Jun 24, 2010 at 11:54 AM, Jeffrey Evans
>>>> <Jeffrey.Evans at dartmouth.edu> wrote:
>>>>> Like others, I have experienced trouble with estimation of the scale
>>>>> parameter using the quasi-distributions in lme4, which is necessary to
>>>>> calculate QAICc and rank overdispersed generalized linear mixed 
>>>>> models.
>>>>
>>>>> I had several exchanges with Ben Bolker about this early last year
>>>>> after his TREE paper came out
>>>>> (http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347%2809
>>>>> %29000 19-6), and I know it's been discussed on on this list. Has
>>>>> there been or is there any potential resolution to this forthcoming in
>>>>> future releases of
>>>>> lme4 or lme4a? I run into overdispersed binomial distributions
>>>>> frequently and have had to use SAS to deal with them. SAS appears to
>>>>> work, but it won't estimate the overdispersion parameter using laplace
>>>>> estimation (only PQL), As I understand it, these pseudo-Iikelihoods
>>>>> can't be used for model ranking. I don't know why SAS can't/won't, but
>>>>> lme4 will run these quasi-binomial and quasi-poisson distributions 
>>>>> with
>>>> Laplace estimation.
>>>>
>>>>> Is there a workable way to use lme4 for modeling overdispersed
>>>>> binomial data?
>>>>
>>>> I have trouble discussing this because I come from a background as a
>>>> mathematician and am used to tracing derivations back to the original
>>>> definitions.  So when I think of a likelihood (or, equivalently, a
>>>> deviance) to be optimized it only makes sense to me if there is a
>>>> probability distribution associated with the model.  And for the
>>>> quasi-binomial and quasi-Poisson families, there isn't a probability
>>>> distribution.  To me that means that discussing maximum likelihood
>>>> estimators for such models is nonsense.  The models simply do not 
>>>> exist.
>>>> One can play tricks in the case of a generalized linear model to 
>>>> estimate a
>>>> "quasi-parameter" that isn't part of the probability distribution but 
>>>> it is
>>>> foolhardy to expect that the tricks will automatically carry over to a
>>>> generalized linear mixed model.
>>>>
>>>> I am not denying that data that are over-dispersed with respect to the
>>>> binomial or Poisson distributions can and do occur.  But having data 
>>>> like
>>>> this and a desire to model it doesn't make the quasi families real.  In 
>>>> his
>>>> signature Thierry Onkelinx quotes
>>>>
>>>> The combination of some data and an aching desire for an answer does 
>>>> not
>>>> ensure that a reasonable answer can be extracted from a given body of 
>>>> data.
>>>> ~ John Tukey
>>>>
>>>> I could and do plan to incorporate the negative binomial family but, 
>>>> without
>>>> a definition that I can understand of a quasi-binomial or quasi-Poisson
>>>> distribution and its associated probability function, I'm stuck. To me 
>>>> it's
>>>> a "build bricks without straw" situation - you can't find maximum 
>>>> likelihood
>>>> estimates for parameters that aren't part of the likelihood.
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>> http://www.maths.anu.edu.au/~johnm
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>