[R] overdispersion + GAM

Peter Dalgaard p.dalgaard at biostat.ku.dk
Tue Feb 12 09:05:50 CET 2008


Gavin Simpson wrote:
> On Mon, 2008-02-11 at 14:46 -0500, Ravi Varadhan wrote:
>   
>> No.  Binomial data can indeed be overdispersed.  See McCullagh & Nelder
>> (1989, section 4.5).  Accounting for over(under)dispersion in binomial and
>> Poisson distributions is, in fact, one of the original impetus for GEE type
>> developments.  See also a nice paper by Liang & McCullagh (Biometrics 1993,
>> p. 623-630), which discusses numerous examples of overdispersion in binary
>> data.  
>>
>> Ravi.
>>     
>
> Hi Ravi,
>
> I was very careful to say "Bernoulli" rather than "binomial". I
> understand that overdispersion can be present in Poisson or binomial
> (M>1), hence the need for a quasibinomial family function. I was,
> however, always led to believe that overdispersion in binary data was
> not possible, and that was how I interpreted the OP's statement about
> presence/absence data.
>
>   
Yep. A qualification that one should probably include is that it refers 
to independently and identically sampled data. The point being that you 
cannot have a distribution on {0, 1} where the variance is anything but 
p(1-p) where p is the mean; if you put a distribution on p and integrate 
it out, you still end up with the same variance.

Correlation structures can still be present and may lead to both over- 
and underdispersion of the total counts or proportions. (E.g. the total 
number of blacksmiths in olden days in a county would typically equal 
the number of villages --- underdispersion, whereas group phenomena like 
when either everyone or noone in a school class does something  leads to 
overdispersion of the  overall proportion.)

> This appears to have been discussed recently on the R-help list:
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/91242.html is a reply to
> a posting by Peter Dalgaard (in response to an original question on
> R-help - apologies, I can't seem to get to the email archives on tolstoy
> to find the start of the thread).
>
> My response was in the same vein as Peter's ">> There is no such thing
> as overdispersion for binary data." (quoted from his response to the
> OP). To be fair (for those not going to look at the thread), Peter then
> follows this up later in the thread saying (in reply to John
> Maindonald's posting) "I don't really disagree, of course. I was mainly
> being provocative."
>
> The two messages from Peter and John in that thread are very
> interesting; I'm not sure I fully understand what they are going on
> about, but I get the gist.
>
> And of course, I would be more than happy to be corrected and pointed in
> the direction of something not too technical (I'm an ecologist, not a
> statistician or mathematician) that discusses this. To that end, I'll be
> hunting out McCullagh & Nelder and the Biometrics paper you cite, Ravi,
> but if you or anyone can point to other literature, I'd be most
> grateful.
>
> All the best,
>
> G
>
>   
>> ----------------------------------------------------------------------------
>> -------
>>
>> Ravi Varadhan, Ph.D.
>>
>> Assistant Professor, The Center on Aging and Health
>>
>> Division of Geriatric Medicine and Gerontology 
>>
>> Johns Hopkins University
>>
>> Ph: (410) 502-2619
>>
>> Fax: (410) 614-9625
>>
>> Email: rvaradhan at jhmi.edu
>>
>> Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
>>
>>  
>>
>> ----------------------------------------------------------------------------
>> --------
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Gavin Simpson
>> Sent: Monday, February 11, 2008 12:37 PM
>> To: anna banana
>> Cc: r-help at r-project.org
>> Subject: Re: [R] overdispersion + GAM
>>
>> On Mon, 2008-02-11 at 07:35 -0800, anna banana wrote:
>>     
>>> Hi,
>>>
>>> there are a lot of messages dealing with overdispersion, but I couldn't
>>>       
>> find
>>     
>>> anything about how to test for overdispersion. I applied a GAM with
>>>       
>> binomial
>>     
>>> distribution on my presence/absence data, and would like to check for
>>> overdispersion. Does anyone know the command?
>>>       
>> Bernoulli data (presence/absence of single species say) can't be
>> overdispersed, so there is no need to test or correct for it.
>>
>> G
>>
>>     
>>> Many thanks,
>>>
>>> Anna
>>>       


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list