[R] Overdispersion with binomial distribution
Ben Bolker
bolker at ufl.edu
Tue Feb 17 18:53:42 CET 2009
Jessica L Hite/hitejl/O/VCU <hitejl <at> vcu.edu> writes:
> I am attempting to run a glm with a binomial model to analyze proportion
> data.
> I have been following Crawley's book closely and am wondering if there is
> an accepted standard for how much is too much overdispersion? (e.g. change
> in AIC has an accepted standard of 2).
In principle, in the null case (i.e. data are really binomial)
the deviance is chi-squared distributed with the df equal
to the residual df.
For example:
example(glm)
deviance(glm.D93) ## 5.13
summary(glm.D93)$dispersion ## 1 (by definition)
dfr <- df.residual(glm.D93)
deviance(glm.D93)/dfr ## 1.28
d2 <- sum(residuals(glm.D93,"pearson")^2) ## 5.17
(disp2 <- d2/dfr) ## 1.293
gg2 <- update(glm.D93,family=quasipoisson)
summary(gg2)$dispersion ## 1.293, same as above
pchisq(d2,df=dfr,lower.tail=FALSE)
all.equal(coef(glm.D93),coef(gg2)) ## TRUE
se1 <- coef(summary(glm.D93))[,"Std. Error"]
se2 <- coef(summary(gg2))[,"Std. Error"]
se2/se1
# (Intercept) outcome2 outcome3 treatment2 treatment3
# 1.137234 1.137234 1.137234 1.137234 1.137234
sqrt(disp2)
# [1] 1.137234
> My code and output are below, given the example in the book, these data are
> WAY overdispersed .....do I mention this and go on or does this signal the
> need to try a different model? If so, any suggestions on the type of
> distribution (gamma or negative binomial ?)?
Way overdispersed may indicate model lack of fit. Have
you examined residuals/data for outliers etc.?
quasibinomial should be fine, or you can try beta-binomial
(see the aod package) ...
> attach(Clutch2)
> y<-cbind(Total,Size-Total)
> glm1<-glm(y~Pred,"binomial")
> summary(glm1)
>
> Call:
> glm(formula = y ~ Pred, family = "binomial")
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -9.1022 -2.7899 -0.4781 2.6058 8.4852
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 1.35095 0.06612 20.433 < 2e-16 ***
> PredF -0.34811 0.11719 -2.970 0.00297 **
> PredSN -3.29156 0.10691 -30.788 < 2e-16 ***
> PredW -1.46451 0.12787 -11.453 < 2e-16 ***
> PredWF -0.56412 0.13178 -4.281 1.86e-05 ***
> ---
> #### the output for residual deviance and df does not change even when I
> use quasibinomial, is this ok? #####
That's as expected.
> library(MASS)
you don't really need MASS for quasibinomial.
> > glm2<-glm(y~Pred,"quasibinomial")
> > summary(glm2)
>
> Call:
> glm(formula = y ~ Pred, family = "quasibinomial")
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -9.1022 -2.7899 -0.4781 2.6058 8.4852
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.3510 0.2398 5.633 1.52e-07 ***
> PredF -0.3481 0.4251 -0.819 0.41471
> PredSN -3.2916 0.3878 -8.488 1.56e-13 ***
> PredW -1.4645 0.4638 -3.157 0.00208 **
> PredWF -0.5641 0.4780 -1.180 0.24063
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> (Dispersion parameter for quasibinomial family taken to be 13.15786)
>
> Null deviance: 2815.5 on 108 degrees of freedom
> Residual deviance: 1323.5 on 104 degrees of freedom
> (3 observations deleted due to missingness)
> AIC: NA
>
> Number of Fisher Scoring iterations: 5
>
> > anova(glm2,test="F")
> Analysis of Deviance Table
>
> Model: quasibinomial, link: logit
>
> Response: y
>
> Terms added sequentially (first to last)
>
> Df Deviance Resid. Df Resid. Dev F Pr(>F)
> NULL 108 2815.5
> Pred 4 1492.0 104 1323.5 28.349 6.28e-16 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> > model1<-update(glm2,~.-Pred)
> > anova(glm2,model1,test="F")
> Analysis of Deviance Table
>
> Model 1: y ~ Pred
> Model 2: y ~ 1
> Resid. Df Resid. Dev Df Deviance F Pr(>F)
> 1 104 1323.5
> 2 108 2815.5 -4 -1492.0 28.349 6.28e-16 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> > coef(glm2)
> (Intercept) PredF PredSN PredW PredWF
> 1.3509550 -0.3481096 -3.2915601 -1.4645097 -0.5641223
>
> Thanks
> Jessica
> hitejl <at> vcu.edu
More information about the R-help
mailing list