[R] When to use quasipoisson instead of poisson family
Achim Zeileis
Achim.Zeileis at wu-wien.ac.at
Tue Apr 10 09:11:42 CEST 2007
On Tue, 10 Apr 2007, ronggui wrote:
> It seems that MASS suggest to judge on the basis of
> sum(residuals(mode,type="pearson"))/df.residual(mode). My question: Is
> there any rule of thumb of the cutpoiont value?
>
> The paper "On the Use of Corrections for Overdispersion" suggests
> overdispersion exists if the deviance is at least twice the number of
> degrees of freedom.
There are also formal tests for over-dispersion. I've implemented one for
a package which is not yet on CRAN (code/docs attached), another one is
implemented in odTest() in package "pscl". The latter also contains
further count data regression models which can deal with both
over-dispersion and excess zeros in count data. A vignette explaining the
tools is about to be released.
hth,
Z
> Are there any further hints? Thanks.
>
> --
> Ronggui Huang
> Department of Sociology
> Fudan University, Shanghai, China
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
-------------- next part --------------
\name{dispersiontest}
\alias{dispersiontest}
\title{Dispersion Test}
\description{
Tests the null hypothesis of equidispersion in Poisson GLMs against
the alternative of overdispersion and/or underdispersion.
}
\usage{
dispersiontest(object, trafo = NULL, alternative = c("greater", "two.sided", "less"))
}
\arguments{
\item{object}{a fitted Poisson GLM of class \code{"glm"} as fitted
by \code{\link{glm}} with family \code{\link{poisson}}.}
\item{trafo}{a specification of the alternative (see also details),
can be numeric or a (positive) function or \code{NULL} (the default).}
\item{alternative}{a character string specifying the alternative hypothesis:
\code{"greater"} corresponds to overdispersion, \code{"less"} to
underdispersion and \code{"two.sided"} to either one.}
}
\details{
The standard Poisson GLM models the (conditional) mean
\eqn{\mathsf{E}[y] = \mu}{E[y] = mu} which is assumed to be equal to the
variance \eqn{\mathsf{VAR}[y] = \mu}{VAR[y] = mu}. \code{dispersiontest}
assesses the hypothesis that this assumption holds (equidispersion) against
the alternative that the variance is of the form:
\deqn{\mathsf{VAR}[y] \quad = \quad \mu \; + \; \alpha \cdot \mathrm{trafo}(\mu).}{VAR[y] = mu + alpha * trafo(mu).}
Overdispersion corresponds to \eqn{\alpha > 0}{alpha > 0} and underdispersion to
\eqn{\alpha < 0}{alpha < 0}. The coefficient \eqn{\alpha}{alpha} can be estimated
by an auxiliary OLS regression and tested with the corresponding t (or z) statistic
which is asymptotically standard normal under the null hypothesis.
Common specifications of the transformation function \eqn{\mathrm{trafo}}{trafo} are
\eqn{\mathrm{trafo}(\mu) = \mu^2}{trafo(mu) = mu^2} or \eqn{\mathrm{trafo}(\mu) = \mu}{trafo(mu) = mu}.
The former corresponds to a negative binomial (NB) model with quadratic variance function
(called NB2 by Cameron and Trivedi, 2005), the latter to a NB model with linear variance
function (called NB1 by Cameron and Trivedi, 2005) or quasi-Poisson model with dispersion
parameter, i.e.,
\deqn{\mathsf{VAR}[y] \quad = \quad (1 + \alpha) \cdot \mu = \mathrm{dispersion} \cdot \mu.}{VAR[y] = (1 + alpha) * mu = dispersion * mu.}
By default, for \code{trafo = NULL}, the latter dispersion formulation is used in
\code{dispersiontest}. Otherwise, if \code{trafo} is specified, the test is formulated
in terms of the parameter \eqn{\alpha}{alpha}. The transformation \code{trafo} can either
be specified as a function or an integer corresponding to the function \code{function(x) x^trafo},
such that \code{trafo = 1} and \code{trafo = 2} yield the linear and quadratic formulations
respectively.
}
\value{An object of class \code{"htest"}.}
\references{
Cameron, A.C. and Trivedi, P.K. (1990). Regression-based Tests for Overdispersion in the Poisson Model.
\emph{Journal of Econometrics}, \bold{46}, 347--364.
Cameron, A.C. and Trivedi, P.K. (1998). \emph{Regression Analysis of Count Data}.
Cambridge: Cambridge University Press.
Cameron, A.C. and Trivedi, P.K. (2005). \emph{Microeconometrics: Methods and Applications}.
Cambridge: Cambridge University Press.
}
\seealso{\code{\link{glm}}, \code{\link{poisson}}, \code{\link[MASS]{glm.nb}}}
\examples{
data("RecreationDemand")
fm <- glm(trips ~ ., data = RecreationDemand, family = poisson)
## linear specification (in terms of dispersion)
dispersiontest(fm)
## linear specification (in terms of alpha)
dispersiontest(fm, trafo = 1)
## quadratic specification (in terms of alpha)
dispersiontest(fm, trafo = 2)
dispersiontest(fm, trafo = function(x) x^2)
## further examples
data("DoctorVisits")
fm <- glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson)
dispersiontest(fm)
data("DebTrivedi")
fm <- glm(ofp ~ health + age + gender + married + faminc + privins, data = DebTrivedi, family = poisson)
dispersiontest(fm)
}
\keyword{htest}
-------------- next part --------------
dispersiontest <- function(object, trafo = NULL, alternative = c("greater", "two.sided", "less"))
{
if(!inherits(object, "glm") || family(object)$family != "poisson")
stop("only Poisson GLMs can be tested")
alternative <- match.arg(alternative)
otrafo <- trafo
if(is.numeric(otrafo)) trafo <- function(x) x^otrafo
y <- if(is.null(object$y)) model.response(model.frame(object)) else object$y
yhat <- fitted(object)
aux <- ((y - yhat)^2 - y)/yhat
if(is.null(trafo)) {
STAT <- sqrt(length(aux)) * mean(aux)/sd(aux)
NVAL <- c(dispersion = 1)
EST <- c(dispersion = mean(aux) + 1)
} else {
auxreg <- lm(aux ~ 0 + I(trafo(yhat)/yhat))
STAT <- as.vector(summary(auxreg)$coef[1,3])
NVAL <- c(alpha = 0)
EST <- c(alpha = as.vector(coef(auxreg)[1]))
}
rval <- list(statistic = c(z = STAT),
p.value = switch(alternative,
"greater" = pnorm(STAT, lower.tail = FALSE),
"two.sided" = pnorm(abs(STAT), lower.tail = FALSE)*2,
"less" = pnorm(STAT)),
estimate = EST,
null.value = NVAL,
alternative = alternative,
method = switch(alternative,
"greater" = "Overdispersion test",
"two.sided" = "Dispersion test",
"less" = "Underdispersion test"),
data.name = deparse(substitute(object)))
class(rval) <- "htest"
return(rval)
}
More information about the R-help
mailing list