[R] offset and poisson regression

Renaud Scheifler renaud.scheifler at univ-fcomte.fr
Mon Jul 27 16:27:31 CEST 2009


Not sure that the list is the best place for this question, but we are 
going mad with this... We are trying to fit a poisson regression to 
count data, eg the number of fledged youngs of blue tits (NPe) as a 
function of the clutch size (GPc) and other environment variables. Here 
are the original data (dumped) (we just omit the environment variables 
to simplify):

tab<-
structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"), class = 
"data.frame", row.names = c(NA,
-127L))

It seems logical to insert "clutch size" as an offset term, since we are 
actually interested in the ratio fledged youngs/clutch size. However, 
the final results are quite surprising:

modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)

if we compute the predictions, we get numbers which looks like a gross 
overestimation of the reality (eg 14.6, 39.7, etc...) -including the 
fact that it implies that one can have more fledged youngs than eggs !:

 [1]  0.7  2.0  2.0  5.4  0.7  2.0  0.7  2.0  0.7  0.7  2.0  2.0  2.0  
0.3  0.1  0.7  2.0
[18]  0.1  0.7  2.0  0.7  0.7  0.7  0.3  0.7  2.0  0.7  2.0  0.7  5.4  
2.0  0.3  5.4 14.6
[35]  5.4  5.4  5.4  0.7  5.4  2.0  0.7  2.0 14.6  5.4  2.0  0.7  5.4  
2.0  2.0  5.4  2.0
[52]  2.0  2.0  5.4  0.7  0.7 14.6 14.6  5.4  5.4  2.0  5.4  2.0  0.7  
5.4 14.6  2.0  5.4
[69]  5.4  0.7  5.4  0.7 39.7  0.7  0.3  5.4  2.0  2.0  0.7 14.6  0.7  
5.4  2.0  5.4  5.4
[86]  2.0  5.4 14.6  5.4  5.4  2.0

Otherwise, if clutch size is inserted as a variable (and not as an 
offset), predictions are much more realistic, with no extreme values :

modsr0<-glm(NPe~GPc,family="poisson",data=tab)
round(exp(predict(modsr0)),1)
 [1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7 
2.2 3.2 3.7 3.2 3.2
[23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4 
3.7 3.2 3.7 5.3 4.4
[45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7 
4.4 3.7 3.2 4.4 5.3
[67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7 
4.4 4.4 3.7 4.4 5.3
[89] 4.4 4.4 3.7

Can any sound statistician provide a hint about what to do or how to 
interprete this ?

Thanks in advance,

Renaud and Patrick






More information about the R-help mailing list