[R] offset and poisson regression
Renaud Scheifler
renaud.scheifler at univ-fcomte.fr
Mon Jul 27 16:27:31 CEST 2009
Not sure that the list is the best place for this question, but we are
going mad with this... We are trying to fit a poisson regression to
count data, eg the number of fledged youngs of blue tits (NPe) as a
function of the clutch size (GPc) and other environment variables. Here
are the original data (dumped) (we just omit the environment variables
to simplify):
tab<-
structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"), class =
"data.frame", row.names = c(NA,
-127L))
It seems logical to insert "clutch size" as an offset term, since we are
actually interested in the ratio fledged youngs/clutch size. However,
the final results are quite surprising:
modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)
if we compute the predictions, we get numbers which looks like a gross
overestimation of the reality (eg 14.6, 39.7, etc...) -including the
fact that it implies that one can have more fledged youngs than eggs !:
[1] 0.7 2.0 2.0 5.4 0.7 2.0 0.7 2.0 0.7 0.7 2.0 2.0 2.0
0.3 0.1 0.7 2.0
[18] 0.1 0.7 2.0 0.7 0.7 0.7 0.3 0.7 2.0 0.7 2.0 0.7 5.4
2.0 0.3 5.4 14.6
[35] 5.4 5.4 5.4 0.7 5.4 2.0 0.7 2.0 14.6 5.4 2.0 0.7 5.4
2.0 2.0 5.4 2.0
[52] 2.0 2.0 5.4 0.7 0.7 14.6 14.6 5.4 5.4 2.0 5.4 2.0 0.7
5.4 14.6 2.0 5.4
[69] 5.4 0.7 5.4 0.7 39.7 0.7 0.3 5.4 2.0 2.0 0.7 14.6 0.7
5.4 2.0 5.4 5.4
[86] 2.0 5.4 14.6 5.4 5.4 2.0
Otherwise, if clutch size is inserted as a variable (and not as an
offset), predictions are much more realistic, with no extreme values :
modsr0<-glm(NPe~GPc,family="poisson",data=tab)
round(exp(predict(modsr0)),1)
[1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7
2.2 3.2 3.7 3.2 3.2
[23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4
3.7 3.2 3.7 5.3 4.4
[45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7
4.4 3.7 3.2 4.4 5.3
[67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7
4.4 4.4 3.7 4.4 5.3
[89] 4.4 4.4 3.7
Can any sound statistician provide a hint about what to do or how to
interprete this ?
Thanks in advance,
Renaud and Patrick
More information about the R-help
mailing list