[R] Overdispersed poisson - negative observation
David Firth
david.firth at nuffield.oxford.ac.uk
Thu Jan 16 22:06:03 CET 2003
Peter
You are right to suppose that negative observations should not be a
problem for a quasi-Poisson fit (though negative *fitted* values would
be, in a log-linear model).
The problem is that the "quasipoisson" function is overly restrictive,
in requiring non-negativity of the y variable. (commenting out the
error trap is not enough to solve the problem). There are two places
where "quasipoisson" has problems:
dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y ==
0, 1, y/mu)) - (y - mu))
fails for negative values of y, while
initialize <- expression({
if (any(y < 0)) stop(paste("Negative values not allowed for",
"the quasiPoisson family"))
n <- rep(1, nobs)
mustart <- y + 0.1
})
would (without the error trap) return negative initial values for
means, which won't work with a log link.
The first problem is the hard one: the deviance isn't defined when y is
negative. But an alternative is to use the Pearson X^2 statistic,
which is defined for all values of y (and makes more sense anyway for
quasi-likelhood models -- it's what you use to determine the dispersion
constant).
The second problem is easy to fix: for example, just take as starting
values the global mean (corresponding to zero-valued effects in the
model). I often find that this is a good starting point anyway --
better than the default choice, which typically is not in the model
space (even when there are no negative values of y).
I give below an amended "quasipoisson" which seems to work, ie it fits
the model by the quasi-likelihood method with variance proportional to
mean. It makes only the two changes just described. A *slightly*
unsatisfactory aspect is that quantities reported as "deviance" are in
fact Pearson X^2 statistics. But no other choice really makes sense
there.
David
quasipoisson <- function (link = "log")
## Amended by David Firth, 2003.01.16, at points labelled ###
## to cope with negative y values
##
## Computes Pearson X^2 rather than Poisson deviance
##
## Starting values are all equal to the global mean
{
linktemp <- substitute(link)
if (!is.character(linktemp)) {
linktemp <- deparse(linktemp)
if (linktemp == "link")
linktemp <- eval(link)
}
if (any(linktemp == c("log", "identity", "sqrt")))
stats <- make.link(linktemp)
else stop(paste(linktemp, "link not available for poisson",
"family; available links are", "\"identity\", \"log\" and
\"sqrt\""))
variance <- function(mu) mu
validmu <- function(mu) all(mu > 0)
dev.resids <- function(y, mu, wt) wt*(y-mu)^2/mu ###
aic <- function(y, n, mu, wt, dev) NA
initialize <- expression({
n <- rep(1, nobs)
mustart <- rep(mean(y), length(y)) ###
})
structure(list(family = "quasipoisson", link = linktemp,
linkfun = stats$linkfun, linkinv = stats$linkinv, variance =
variance,
dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta,
initialize = initialize, validmu = validmu, valideta =
stats$valideta),
class = "family")
}
On Thursday, Jan 16, 2003, at 15:45 Europe/London,
peter.fledelius at wgo.royalsun.com wrote:
> Dear R users
>
> I have been looking for functions that can deal with overdispersed
> poisson
> models. Some (one) of the observations are negative. According to
> actuarial
> literature (England & Verall, Stochastic Claims Reserving in General
> Insurance , Institute of Actiuaries 2002) this can be handled through
> the
> use of quasi likelihoods instead of normal likelihoods. The presence of
> negatives is not normal in a poisson model, however, we see them
> frequently
> in this type of data, and we would like to be able to fit the model
> anyway.
>
> At the moment R is complaining about negative values and the link
> function
> = log.
>
> My code looks like this. Do any of you know if this problem can be
> solved
> in R? Any suggestions are welcomed.
>
> Kind regards,
>
> Peter Fledelius (new R user)
>
> *********** Code ************
> paym <- c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172,
> 106, 4179, 1111, 5270, 3116, 1817, -103, 673, 535,
> 3410, 5582, 4881, 2268, 2594, 3479, 649, 603,
> 5655, 5900, 4211, 5500, 2159, 2658, 984,
> 1092, 8473, 6271, 6333, 3786, 225,
> 1513, 4932, 5257, 1233, 2917,
> 557, 3463, 6956, 1368,
> 1351, 5596, 6165,
> 3133, 2262,
> 2063)
> alpha <- factor(c(1,1,1,1,1,1,1,1,1,1,
> 2,2,2,2,2,2,2,2,2,
> 3,3,3,3,3,3,3,3,
> 4,4,4,4,4,4,4,
> 5,5,5,5,5,5,
> 6,6,6,6,6,
> 7,7,7,7,
> 8,8,8,
> 9,9,
> 10))
> beta <- factor(c(1,2,3,4,5,6,7,8,9,10,
> 1,2,3,4,5,6,7,8,9,
> 1,2,3,4,5,6,7,8,
> 1,2,3,4,5,6,7,
> 1,2,3,4,5,6,
> 1,2,3,4,5,
> 1,2,3,4,
> 1,2,3,
> 1,2,
> 1))
> d.AD <- data.frame(paym, alpha, beta)
> glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson())
> glm.qD93
> ************ Code end ***************
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
More information about the R-help
mailing list