# [R] Poisson Maximum Likelihood Estimation

David Winsemius dwinsemius at comcast.net
Fri Jan 25 15:21:03 CET 2008

```"Paul Sweeting" <mail at paulsweeting.co.uk> wrote in
news:000101c85f4e\$c3d25b50\$4b7711f0\$@co.uk:

> Hi
>
> I am trying to carry out some maximum likelihood estimation and I'm
> not making much headway, and I'm hoping that someone will be able to
> point me in the right direction.
>
> I am modelling mortality statistics.  One way to do this is to model
> the mortality rate (or, more accurately, log of the mortality rate,
> log_m) as (say) a constant plus a proportion of age, plus time, so:
>
> r_1 <- lm(formula=log_m ~ age + time)
> summary(r_1)
>
> However, an alternative approach is to use try and estimate the
> number of deaths from the poisson mean mortality rate, and the
> number of people, with the poisson mean being defined in terms of
> age and time (And a constant). Conceptually I can see how this
> should work, in terms of linking the poisson probabilities together
> at each age and optimising the coefficients on age and time to
> maximise the log likelihood, but I cannot work out how to express
> this in R.  Hopefully this outlines what I'm trying to do - any

The most straightforward approach would be to use glm() with
family="poisson", a.k.a Poisson regression.  Don't ask me why
Gamma is capitalized as an R family but poisson is not, I don't know.
Since the link with family=poisson is log() by default, you model the
rate (or count) without prior transformation. You can model either the
rates or the counts. If counts are modeled, you would add
offset=log(person.years) to the covariate side of the formula to
adjust for varying intensity of exposure. The help examples inglm() don't
have much that directly illustrates PR. Here is a link to a nice worked
example:
<http://www.faculty.sbc.edu/bkirk/Biostatistics/course%20documents%20for%202006/Poisson%20Regression.doc>

Some examples from an epi course at UCB's SPH:
<http://www.medepi.net/selvin/chap11job.txt>

A third approach would be to use count as the dependent variable and
log(expected) as the offset. This would give you an estimate of the SMR
for various covariate strata.

--
David Winsemius

```