[R] When to use quasipoisson instead of poisson family

Achim Zeileis Achim.Zeileis at wu-wien.ac.at
Tue Apr 10 09:11:42 CEST 2007

On Tue, 10 Apr 2007, ronggui wrote:

> It seems that MASS suggest to judge on the basis of
> sum(residuals(mode,type="pearson"))/df.residual(mode). My question: Is
> there any rule of thumb of the cutpoiont value?
> The paper "On the Use of Corrections for Overdispersion"  suggests
> overdispersion exists if the deviance is at least twice the number of
> degrees of freedom.

There are also formal tests for over-dispersion. I've implemented one for
a package which is not yet on CRAN (code/docs attached), another one is
implemented in odTest() in package "pscl". The latter also contains
further count data regression models which can deal with both
over-dispersion and excess zeros in count data. A vignette explaining the
tools is about to be released.


> Are there any further hints? Thanks.
> --
> Ronggui Huang
> Department of Sociology
> Fudan University, Shanghai, China
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-------------- next part --------------
\title{Dispersion Test}
 Tests the null hypothesis of equidispersion in Poisson GLMs against
 the alternative of overdispersion and/or underdispersion.
dispersiontest(object, trafo = NULL, alternative = c("greater", "two.sided", "less"))
  \item{object}{a fitted Poisson GLM of class \code{"glm"} as fitted
    by \code{\link{glm}} with family \code{\link{poisson}}.}
 \item{trafo}{a specification of the alternative (see also details),
   can be numeric or a (positive) function or \code{NULL} (the default).}
 \item{alternative}{a character string specifying the alternative hypothesis:
   \code{"greater"} corresponds to overdispersion, \code{"less"} to
   underdispersion and \code{"two.sided"} to either one.}
The standard Poisson GLM models the (conditional) mean
\eqn{\mathsf{E}[y] = \mu}{E[y] = mu} which is assumed to be equal to the
variance \eqn{\mathsf{VAR}[y] = \mu}{VAR[y] = mu}. \code{dispersiontest}
assesses the hypothesis that this assumption holds (equidispersion) against
the alternative that the variance is of the form:
  \deqn{\mathsf{VAR}[y] \quad = \quad \mu \; + \; \alpha \cdot \mathrm{trafo}(\mu).}{VAR[y] = mu + alpha * trafo(mu).}
Overdispersion corresponds to \eqn{\alpha > 0}{alpha > 0} and underdispersion to
\eqn{\alpha < 0}{alpha < 0}. The coefficient \eqn{\alpha}{alpha} can be estimated
by an auxiliary OLS regression and tested with the corresponding t (or z) statistic
which is asymptotically standard normal under the null hypothesis.

Common specifications of the transformation function \eqn{\mathrm{trafo}}{trafo} are
\eqn{\mathrm{trafo}(\mu) = \mu^2}{trafo(mu) = mu^2} or \eqn{\mathrm{trafo}(\mu) = \mu}{trafo(mu) = mu}.
The former corresponds to a negative binomial (NB) model with quadratic variance function
(called NB2 by Cameron and Trivedi, 2005), the latter to a NB model with linear variance
function (called NB1 by Cameron and Trivedi, 2005) or quasi-Poisson model with dispersion 
parameter, i.e.,
  \deqn{\mathsf{VAR}[y] \quad = \quad (1 + \alpha) \cdot \mu = \mathrm{dispersion} \cdot \mu.}{VAR[y] = (1 + alpha) * mu = dispersion * mu.}
By default, for \code{trafo = NULL}, the latter dispersion formulation is used in
\code{dispersiontest}. Otherwise, if \code{trafo} is specified, the test is formulated
in terms of the parameter \eqn{\alpha}{alpha}. The transformation \code{trafo} can either
be specified as a function or an integer corresponding to the function \code{function(x) x^trafo},
such that \code{trafo = 1} and \code{trafo = 2} yield the linear and quadratic formulations

\value{An object of class \code{"htest"}.}

Cameron, A.C. and Trivedi, P.K. (1990). Regression-based Tests for Overdispersion in the Poisson Model.  
\emph{Journal of Econometrics}, \bold{46}, 347--364.

Cameron, A.C. and Trivedi, P.K. (1998). \emph{Regression Analysis of Count Data}. 
Cambridge: Cambridge University Press.

Cameron, A.C. and Trivedi, P.K. (2005). \emph{Microeconometrics: Methods and Applications}.
Cambridge: Cambridge University Press.

\seealso{\code{\link{glm}}, \code{\link{poisson}}, \code{\link[MASS]{glm.nb}}}

fm <- glm(trips ~ ., data = RecreationDemand, family = poisson)

## linear specification (in terms of dispersion)
## linear specification (in terms of alpha)
dispersiontest(fm, trafo = 1)
## quadratic specification (in terms of alpha)
dispersiontest(fm, trafo = 2)
dispersiontest(fm, trafo = function(x) x^2)

## further examples
fm <- glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson)

fm <- glm(ofp ~ health + age + gender + married + faminc + privins, data = DebTrivedi, family = poisson)
-------------- next part --------------
dispersiontest <- function(object, trafo = NULL, alternative = c("greater", "two.sided", "less"))
  if(!inherits(object, "glm") || family(object)$family != "poisson")
    stop("only Poisson GLMs can be tested")
  alternative <- match.arg(alternative)
  otrafo <- trafo
  if(is.numeric(otrafo)) trafo <- function(x) x^otrafo
  y <- if(is.null(object$y)) model.response(model.frame(object)) else object$y
  yhat <- fitted(object)
  aux <- ((y - yhat)^2 - y)/yhat
  if(is.null(trafo)) {
    STAT <- sqrt(length(aux)) * mean(aux)/sd(aux)
    NVAL <- c(dispersion = 1)
    EST <- c(dispersion = mean(aux) + 1)    
  } else {
    auxreg <- lm(aux ~ 0 + I(trafo(yhat)/yhat))
    STAT <- as.vector(summary(auxreg)$coef[1,3])
    NVAL <- c(alpha = 0)
    EST <- c(alpha = as.vector(coef(auxreg)[1]))
  rval <- list(statistic = c(z = STAT),
               p.value = switch(alternative,
                           "greater" = pnorm(STAT, lower.tail = FALSE),
		           "two.sided" = pnorm(abs(STAT), lower.tail = FALSE)*2,
		           "less" = pnorm(STAT)),
	       estimate = EST,
	       null.value = NVAL,
	       alternative = alternative,
	       method = switch(alternative,
	                  "greater" = "Overdispersion test",
			  "two.sided" = "Dispersion test",
			  "less" = "Underdispersion test"),
	       data.name = deparse(substitute(object)))
  class(rval) <- "htest"

More information about the R-help mailing list