[R] Confidence intervals for relative risk
Terry Therneau
therneau at mayo.edu
Mon Nov 13 16:54:04 CET 2006
Wolfgang,
It is common to handle relative risk problems using Poisson regression.
In your example you have 8 events out of 508 tries, and 0/500 in the second
data set.
> tdata <- data.frame(y=c(8,0), n=c(508,500), group=1:0)
> fit <- glm(y ~ group + offset(log(n)), data=tdata, family=poisson)
Because of the zero, the standard beta/se(beta) confidence intervals don't
work. In fact, the actual MLE for the ratio is infinity, which the above
glm decides is exp(33.85) = 5* 10^14 --- which is close enough to infinity
for me. What you want is the lower limit of the interval, which you can
find with a profile likelihood. That is, look at the curve of deviance vs
beta, draw a horizontal line 3.84 units down from the top (chisq on 1 df),
and where it itersects the curve is your confidence limit.
For the above, since it is 2 groups with 2 coefficients, the deviance of
the full model happens to be 0. Your approximation gave a lower limit of
.97 which is about exp(0), so I'll guess a solution between -2 and 2.
> xx <- seq(-2, 2, length=21)
> for (i in 1:21) {
fit <- glm(y ~ offset(group*xx[i] + log(n)), poisson, tdata)
print(c(xx[i], fit$deviance))
}
[1] -2.00000 33.80736
[1] -1.80000 31.02994
[1] -1.60000 28.33138
[1] -1.40000 25.72327
[1] -1.20000 23.21769
[1] -1.00000 20.82691
[1] -0.80000 18.56281
[1] -0.60000 16.43629
[1] -0.40000 14.45668
[1] -0.20000 12.63108
[1] 0.00000 10.96387
[1] 0.20000 9.45639
[1] 0.400000 8.106805
[1] 0.600000 6.910274
[1] 0.800000 5.859303
[1] 1.000000 4.944278
[1] 1.200000 4.154088
[1] 1.400000 3.476757
[1] 1.60000 2.90003
[1] 1.80000 2.41186
[1] 2.000000 2.000785
So the "exact" lower limit is somewhere between exp(1.4) and exp(1.2), which
is around 3.6. One can easily refine this by tucking the fit into an
iterative search, or just resetting xx to a new range.
The offset(log(n)) part of the model, BTW, is a well known trick in poisson
models. It has to do with the fact that the likelihood is in terms of the
number of events, but we want coefficients in terms of rates, and E(y) = rate*n.
Adding an offset of xx[i] * group fits a model with the group effect fixed at
xx, but allowing the intercept to vary.
For more conceptual depth, you could look up 'rate regression' or
'standardized mortality ratio' in the Encyclopedia of Biostatistics --- it is
in the latter computation that this approach is quite common. The idea of
adding a fraction is found in Anscombe(1949), Transformations of Poisson,
binomial and negative-binomial data. Biometrika, vol 35, p246-254, but for
each single estimate not for the ratio.
Terry Therneau
Mayo Clinic
More information about the R-help
mailing list