[R] offset and poisson regression
Renaud Lancelot
renaud.lancelot at gmail.com
Mon Jul 27 17:46:48 CEST 2009
You should use offset(log(Gpc)) instead of offset(Gpc)
> options(width = 65)
> fm <- glm(NPe ~ 1 + offset(log(GPc)), family = poisson,data = tab)
> fitted(fm)
1 2 3 4 6 7 8
3.181818 3.818182 3.818182 4.454545 3.181818 3.818182 3.181818
9 12 13 14 15 16 18
3.818182 3.181818 3.181818 3.818182 3.818182 3.818182 2.545455
20 21 22 23 24 27 28
1.909091 3.181818 3.818182 1.909091 3.181818 3.818182 3.181818
29 30 31 32 33 34 35
3.181818 3.181818 2.545455 3.181818 3.818182 3.181818 3.818182
36 38 39 40 41 42 44
3.181818 4.454545 3.818182 2.545455 4.454545 5.090909 4.454545
45 46 48 51 52 54 58
4.454545 4.454545 3.181818 4.454545 3.818182 3.181818 3.818182
59 62 64 68 71 75 76
5.090909 4.454545 3.818182 3.181818 4.454545 3.818182 3.818182
81 82 83 85 86 87 88
4.454545 3.818182 3.818182 3.818182 4.454545 3.181818 3.181818
89 90 91 92 93 94 95
5.090909 5.090909 4.454545 4.454545 3.818182 4.454545 3.818182
96 98 99 100 101 102 103
3.181818 4.454545 5.090909 3.818182 4.454545 4.454545 3.181818
104 106 107 108 109 110 111
4.454545 3.181818 5.727273 3.181818 2.545455 4.454545 3.818182
112 113 114 115 116 117 118
3.818182 3.181818 5.090909 3.181818 4.454545 3.818182 4.454545
119 121 122 124 125 126 127
4.454545 3.818182 4.454545 5.090909 4.454545 4.454545 3.818182
All the best,
Renaud
2009/7/27 Renaud Scheifler <renaud.scheifler at univ-fcomte.fr>:
> Not sure that the list is the best place for this question, but we are going
> mad with this... We are trying to fit a poisson regression to count data, eg
> the number of fledged youngs of blue tits (NPe) as a function of the clutch
> size (GPc) and other environment variables. Here are the original data
> (dumped) (we just omit the environment variables to simplify):
>
> tab<-
> structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
> NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
> 6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
> NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
> 0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
> 0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
> 0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
> 6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
> NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
> 6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
> 5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
> 7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
> 7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
> 6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
> 8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
> 6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
> 6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"), class =
> "data.frame", row.names = c(NA,
> -127L))
>
> It seems logical to insert "clutch size" as an offset term, since we are
> actually interested in the ratio fledged youngs/clutch size. However, the
> final results are quite surprising:
>
> modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)
>
> if we compute the predictions, we get numbers which looks like a gross
> overestimation of the reality (eg 14.6, 39.7, etc...) -including the fact
> that it implies that one can have more fledged youngs than eggs !:
>
> [1] 0.7 2.0 2.0 5.4 0.7 2.0 0.7 2.0 0.7 0.7 2.0 2.0 2.0 0.3
> 0.1 0.7 2.0
> [18] 0.1 0.7 2.0 0.7 0.7 0.7 0.3 0.7 2.0 0.7 2.0 0.7 5.4 2.0
> 0.3 5.4 14.6
> [35] 5.4 5.4 5.4 0.7 5.4 2.0 0.7 2.0 14.6 5.4 2.0 0.7 5.4 2.0
> 2.0 5.4 2.0
> [52] 2.0 2.0 5.4 0.7 0.7 14.6 14.6 5.4 5.4 2.0 5.4 2.0 0.7 5.4
> 14.6 2.0 5.4
> [69] 5.4 0.7 5.4 0.7 39.7 0.7 0.3 5.4 2.0 2.0 0.7 14.6 0.7 5.4
> 2.0 5.4 5.4
> [86] 2.0 5.4 14.6 5.4 5.4 2.0
>
> Otherwise, if clutch size is inserted as a variable (and not as an offset),
> predictions are much more realistic, with no extreme values :
>
> modsr0<-glm(NPe~GPc,family="poisson",data=tab)
> round(exp(predict(modsr0)),1)
> [1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7 2.2
> 3.2 3.7 3.2 3.2
> [23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4 3.7
> 3.2 3.7 5.3 4.4
> [45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7 4.4
> 3.7 3.2 4.4 5.3
> [67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7 4.4
> 4.4 3.7 4.4 5.3
> [89] 4.4 4.4 3.7
>
> Can any sound statistician provide a hint about what to do or how to
> interprete this ?
>
> Thanks in advance,
>
> Renaud and Patrick
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Renaud Lancelot
EDEN Project, coordinator
http://www.eden-fp6project.net/
<< EDEN International Conference, Montpellier, 10-12 May 2010 >>
<< http://international-conference2010.eden-fp6project.net/ >>
UMR CIRAD-INRA "Contrôle des maladies animales exotiques et émergentes"
Joint research unit "Control of emerging and exotic animal diseases"
CIRAD, Campus International de Baillarguet TA A-DIR / B
F34398 Montpellier
http://www.cirad.fr http://bluetongue.cirad.fr/
Tel. +33 4 67 59 37 17 - Fax +33 4 67 59 37 95
Secr. +33 4 67 59 37 37 - Cell. +33 6 77 52 08 69
More information about the R-help
mailing list