[Rd] [RfC] Family dispersion
Luis Carvalho
lexcarvalho at gmail.com
Thu Jun 2 04:25:28 CEST 2016
Hi,
I'd like to hear your opinion about the following proposal to make the
computation of dispersion in GLMs more flexible. Dispersion is used in
summary.glm; the relevant code chunk with the dispersion calculation is listed
below (from glm.R):
summary.glm <- function(object, dispersion = NULL,
correlation = FALSE, symbolic.cor = FALSE, ...)
{
est.disp <- FALSE
df.r <- object$df.residual
if(is.null(dispersion)) # calculate dispersion if needed
dispersion <-
if(object$family$family %in% c("poisson", "binomial")) 1
else if(df.r > 0) {
est.disp <- TRUE
if(any(object$weights==0))
warning("observations with zero weight not used for calculating dispersion")
sum((object$weights*object$residuals^2)[object$weights > 0])/ df.r
} else {
est.disp <- TRUE
NaN
}
# ...
}
Many exponential families have unit dispersion, or can be cast to have unit
dispersion, e.g. hypergeometric, negative binomial, and so on. However,
summary.glm only assigns unit dispersion to Poisson and binomial families, as
the code above indicates. My suggestion is to make this check more general by
having a 'dispersion' slot in the family class; for instance, we would have
poisson(...)$dispersion = 1 and binomial(...)$dispersion = 1. The updated
summary.glm would be:
default.dispersion <- function (object, ...) {
df.r <- object$df.residual
if (df.r > 0) {
if (any(object$weights == 0))
warning("observations with zero weight not used for calculating dispersion")
sum((object$weights * object$residuals ^ 2)[object$weights > 0]) / df.r
}
else NaN
}
summary.glm <- function(object, dispersion = default.dispersion,
correlation = FALSE, symbolic.cor = FALSE, ...)
{
if (!is.null(object$family$dispersion)) # use family dispersion?
dispersion <- object$family$dispersion
est.disp <- is.function(dispersion)
dispersion <- if (est.disp) dispersion(object, ...) else dispersion
df.r <- object$df.residual
# ... (unchanged code below)
}
Note that 'dispersion' can be a function taking a glm object or a number
(e.g. 1). Here are some examples:
R> library(MASS)
R> gm <- glm(formula, family=Gamma())
R> summary(gm, dispersion = gamma.dispersion) # ML estimate of dispersion
R> set.dispersion <- function (fam, disp) # update family dispersion
R> structure(within(unclass(fam), dispersion <- disp), class = "family")
R> gm <- glm(formula, family=set.dispersion(Gamma(), gamma.dispersion))
R> summary(gm) # use family dispersion
R> Exp <- function (...) set.dispersion(Gamma(...), 1)
Thanks in advance for the feedback.
Cheers,
Luis
--
Computers are useless. They can only give you answers.
-- Pablo Picasso
--
Luis Carvalho
Associate Professor
Dept. of Mathematics and Statistics
Boston University
http://math.bu.edu/people/lecarval
More information about the R-devel
mailing list