# [R-sig-eco] zero-truncated Poisson

Ben Bolker bbolker at gmail.com
Sun Apr 10 16:38:00 CEST 2011

```On 11-04-09 03:07 AM, Peter Solymos wrote:
> Jeff,
>
> The zero truncated Poisson distribution can be described as a
> probability function conditional on Y>0:
>
> Pr(Y=y|Y>0) = Pr(Y=y) / (1-Pr(Y=0)), y=1,2,3,...
>
> This leads to this function 'nll' for the negative log-likelihood:
>
> nll <- function(theta, y, X) {
>     lambda <- exp(drop(X %*% theta))
>     -sum(dpois(y, lambda, log=TRUE) - log(1-exp(-lambda)))
> }
>
> where theta is the vector of parameters, y is the observation vector,
> X is the design matrix.
>
> A simple simulation is useful to check that the theory works. I used
> 'optim' with the default Nelder-Mead simplex algorithm, initial values
> are based in non-truncated Poisson estimates to speed up convergence:
>
> theta <- c(1, 2)
> X <- model.matrix(~rnorm(100))
> y <- rpois(100,exp(X %*% theta))
> X <- X[y>0,]
> y <- y[y>0]
> inits <- coef(glm(y ~ X-1, family=poisson))
> res <- optim(par=inits, fn=nll, y=y, X=X)
> cbind(theta=theta, theta.hat=res\$par)
>
> Here I used the data you provided, note that 'hessian=TRUE' for some
> subsequent SE calculations:
>
> inits <- coef(glm(taken ~ mass + year, blja, family=poisson))
> y <- blja\$taken
> X <- model.matrix(~ mass + year, blja)
> res <- optim(par=inits, fn=nll, y=y, X=X, hessian=TRUE)
>
> Vector of parameters that minimize 'nll' (maximize the log-likelihood)
> given the data:
>
> cf <- res\$par
>
> Standard error (note that due to minimization, I used the Hessian
>
> se <- sqrt(diag(solve(res\$hessian)))
>
> z and p values and the final output table:
>
> zstat <- cf/se
> pval <- 2 * pnorm(-abs(zstat))
> out <- cbind(cf, se, zstat, pval)
> colnames(out) <- c("Estimate", "Std. Error", "z value", "Pr(>|z|)")
>
> printCoefmat(out, digits = 4, signif.legend = FALSE)
>
> For your data set, this gives:
>
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.76392    0.10659  -7.167 7.66e-13 ***
> mass         0.24008    0.01262  19.028  < 2e-16 ***
> yeartwo     -0.29006    0.11974  -2.423   0.0154 *
>
> Cheers,
>
> Peter
>
> Péter Sólymos
> Alberta Biodiversity Monitoring Institute
> and Boreal Avian Modelling project
> Department of Biological Sciences
> CW 405, Biological Sciences Bldg
> University of Alberta
> Edmonton, Alberta, T6G 2E9, Canada
> Phone: 780.492.8534
> Fax: 780.492.7635
> email <- paste("solymos", "ualberta.ca", sep = "@")
> http://www.abmi.ca
> http://www.borealbirds.ca
>

I agree.  I would just like to point out that, if you define the
truncated Poisson distribution function appropriately, you can then do
this very compactly via the mle2 function in the bbmle package.

dtruncpois <- function(x,lambda,log=FALSE) {
r <- ifelse(x==0,-Inf,dpois(x,lambda,log=TRUE)-
log(1-dpois(0,lambda,log=FALSE)))
if (log) r else exp(r)
}

library(bbmle)
(m1 <- mle2(taken~dtruncpois(exp(logmu)),
parameters=list(logmu~mass+year),
data=blja,
start=list(logmu=1)))
coef(m1)
summary(m1)  ## gives a table almost equivalent to Peter's
confint(m1)  ## profile confidence intervals

After a bit of digging it seems that VGAM::vglm fails when it hits an
NaN -- I haven't figured out more than that it gets this on the *second*
step, not the first, of the optimization. When it works VGAM should be
faster than the sort of generalized ML estimation that Peter and I are
suggesting, but I have fairly often found it to be not very robust when
I try it.

If you plot the data (with a regular Poisson GLM superimposed):

library(ggplot2)
ggplot(blja,aes(x=mass,y=taken,colour=year))+
stat_sum(alpha=0.5)+geom_smooth(method="glm",family="poisson")+
theme_bw()

You see that there are two observations with very large masses.
It turns out that removing these two observations allows vglm() to work:

glm1 <- vglm(taken ~ mass + year,family=pospoisson,
data=blja,subset=mass<10)

m2 <- mle2(taken~dtruncpois(exp(logmu)),
parameters=list(logmu~mass+year),
data=subset(blja,mass<10),
start=list(logmu=1))

It also changes the estimate of the mass effect considerably:

confint(m1)
confint(m2)

Ben Bolker

```