[R] Correcting for overdispersion

peter dalgaard pdalgd at gmail.com
Mon Jul 9 22:34:35 CEST 2012


On Jul 9, 2012, at 21:08 , Lawrence, Adaku wrote:

> Hello,
> Thanks for getting back to me. I was of the impression that once the res. var. is larger than the df then the data was overdispersed and as such the model was not a best fit. Is this true?

Not without qualification. There are various schools, but if you ask me, I think that overdispersion models are used a bit too often without proper attention to what they actually mean. Sometimes the effect is (unwittingly) to paper over systematic lack of fit in the model (judging by your residuals, that's not likely the case here, though). 

To use such models you should have evidence of lack of fit and/or a plausible reason for the extra variation. 

Re. evidence, you have a deviance of 7.31 on 4 df which corresponds to a p value of 0.12 in the asymptotic chi-square distribution. So, not exactly convincing; also, you need to consider whether the expected counts are large enough for the asymptotics to hold.

Re. plausibility, you should ask yourself whether there is good reason to have have an extra random effect operating at the level of individual binomial distributions. This could be the case if you have an experiment of the sort where you give, say, a doses of pesticide to containers of 50 flies, and count the dead ones. In that case, there could be effects of getting the dose slightly wrong, the temperature of the container, and whatnot. If on the other hand, you inject a batch of rats with a dose from a randomly chosen vial, each of which contain a carefully and individually measured-out dose, then it could be quite hard to think of a reason for something increasing or decreasing the probability for all rats at the same dose.

That being said, as far as I can tell, there's no problem in principle with using dose.p on an overdispersed model, because it only depends on vcov(obj). An overdispersion parameter based on 4 df is the most worrying bit.

-pd

> Here is an example of the output from R:
> Call:
> glm(formula = y ~ log(conc), family = binomial)
> Deviance Residuals: 
>        1         2         3         4         5         6  
>  0.54568   1.08474   0.04561  -2.00959   0.05772   1.33891  
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept) -5.52815    0.85916  -6.434 1.24e-10 ***
> log(conc)    0.40457    0.05938   6.813 9.56e-12 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> (Dispersion parameter for binomial family taken to be 1)
>     Null deviance: 78.811  on 5  degrees of freedom
> Residual deviance:  7.311  on 4  degrees of freedom
> AIC: 30.45
> Number of Fisher Scoring iterations: 4
> > 
> > xv<-seq(min(log(conc)-1),max(log(conc)+1),0.01)
> > lines(xv,predict(model,list(conc=exp(xv)),type="response"))
> > 
> > dose.p(model,p=c(0.10,0.25,0.5,0.75,0.90))
>                Dose        SE
> p = 0.10:  8.233179 0.9810446
> p = 0.25: 10.948665 0.6580127
> p = 0.50: 13.664152 0.4703530
> p = 0.75: 16.379638 0.5720159
> p = 0.90: 19.095125 0.8665399
> > exp(13.664152)
> [1] 859539.4
> > exp(13.664152+(1.96*0.4703530))
> [1] 2160918
> > exp(13.664152-(1.96*0.04703530))
> [1] 783842
> BW
> Adaku
> ________________________________________
> From: peter dalgaard [pdalgd at gmail.com]
> Sent: 09 July 2012 20:03
> To: Lawrence, Adaku
> Cc: r-help at r-project.org
> Subject: Re: [R] Correcting for overdispersion
> 
> On Jul 9, 2012, at 20:23 , Lawrence, Adaku wrote:
> 
> > Hello,
> >
> > I am trying to determine LD50 and LD95 using dose.p in MASS however some of the Residual variance is larger than the degrees of freedom. Please can anyone help with any advice as to how i can correct for this?
> 
> Er, in what sense is that a problem? Your code is not reproducible, at least some output to look at might help.
> 
> -pd
> 
> >
> > Here is the model as inputted into R
> >
> >
> >
> > y<-cbind(dead,n-dead)
> >
> > model<-glm(y~log(conc),binomial)
> > summary(model)
> >
> > xv<-seq(min(log(conc)-1),max(log(conc)+1),0.01)
> > lines(xv,predict(model,list(conc=exp(xv)),type="response"))
> >
> > dose.p(model,p=c(0.10,0.25,0.5,0.75,0.90,0.95))
> >
> >
> >
> > Thanks
> >
> > Adaku
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list