[R] second try; writing user-defined GLM link function
Jessi Brown
jessilbrown at gmail.com
Fri Apr 21 19:46:10 CEST 2006
Ok, I persuaded predict.glm to "work" with the default type ("link")
instead of "response." However, I don't think it did what I expected
it to do, as the output (which should be log-odds, right?) leads to
non-sensical final daily survival rates (daily survival rate =
probability of surviving that particular interval). For example, for a
nest of age 67 days, the log-odds of survival are -53.88, leading to a
daily survival rate of essentially 0 - not at all what was observed!
I see two possibilities here. Either the predict function is not
working as expected (still not accounting for interval length), or
possibly something is off in my nest age temporal trend modeling
(predicted log-odds values are reasonable up to MeanAge=44 or so).
Anyone have any thoughts here?
> glm.24.pred<-predict(glm.24, newdata=vc.new)
> glm.24.pred
1 2 3 4 5
6 7
6.0097208 5.5175350 5.0938348 4.7348110 4.4366542
4.1955553 4.0077050
8 9 10 11 12
13 14
3.8692940 3.7765131 3.7255530 3.7126045 3.7338582
3.7855051 3.8637357
15 16 17 18 19
20 21
3.9647409 4.0847113 4.2198379 4.3663111 4.5203220
4.6780610 4.8357191
22 23 24 25 26
27 28
4.9894870 5.1355554 5.2701150 5.3893566 5.4894710
5.5666488 5.6170809
29 30 31 32 33
34 35
5.6369580 5.6224707 5.5698100 5.4751664 5.3347309
5.1446940 4.9012465
36 37 38 39 40
41 42
4.6005793 4.2388830 3.8123484 3.3171662 2.7495272
2.1056221 1.3816416
43 44 45 46 47
48 49
0.5737766 -0.3217823 -1.3088443 -2.3912186 -3.5727145
-4.8571413 -6.2483083
50 51 52 53 54
55 56
-7.7500246 -9.3660995 -11.1003423 -12.9565623 -14.9385687
-17.0501707 -19.2951776
57 58 59 60 61
62 63
-21.6773987 -24.2006432 -26.8687203 -29.6854394 -32.6546097
-35.7800404 -39.0655408
64 65 66 67
-42.5149201 -46.1319876 -49.9205526 -53.8844243
On 4/20/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> > glm.24.pred<-predict(glm.24,newdata=nestday, type="response", SE.fit=T)
>
> What is SE.fit? The help says se.fit. That _may_ be a problem.
>
> However, I think the real problem is that the link function argument
> includes a reference to vc.apfa$days that is appropriate for fitting, not
> prediction. One way out might be (untested)
>
> attach(va.apfa)
> glm.24 <- glm(formula = Success ~ NestHtZ + MeanAge + I(MeanAge^2) +
> I(MeanAge^3), family = logexposure(ExposureDays = days), data = vc.apfa)
> detach()
> attach(nestday)
> glm.24.pred<-predict(glm.24,newdata=nestday, type="response", SE.fit=T)
> detach()
>
> so that 'days' refers to the appropriate dataset both when fitting and
> predicting.
>
> (This is bending glm() to do something it was not designed to do, so some
> hoop-jumping is needed.)
>
>
> On Thu, 20 Apr 2006, Jessi Brown wrote:
>
> > An update for all:
> >
> > Using the combined contributions from Mark and Dr. Ripley, I've been
> > (apparently) successfully formulating both GLM's and GLMM's (using
> > the MASS function glmmPQL) analyzing my nest success data. The beta
> > parameter estimates look reasonable and the top models resemble those
> > from earlier analyses using a different nest survival analysis
> > approach.
> >
> > However, I've now run into problems when trying to predict the daily
> > survival rates from fitted models. For example, for a model
> > considering nest height (NestHtZ) and nest age effects (MeanAge and
> > related terms; there is an overall cubic time trend in this model), I
> > tried to predict the daily survival rate for each day out of a 67 day
> > nest cycle (so MeanAge is a vector of 1 to 67) with mean nest height
> > (also a vector 67 rows in length; both comprise the matrix "nestday").
> > Here's what happens:
> >
> >> summary(glm.24)
> >
> > Call:
> > glm(formula = Success ~ NestHtZ + MeanAge + I(MeanAge^2) + I(MeanAge^3),
> > family = logexposure(ExposureDays = vc.apfa$days), data = vc.apfa)
> >
> > Deviance Residuals:
> > Min 1Q Median 3Q Max
> > -3.3264 -1.2341 0.6712 0.8905 1.5569
> >
> > Coefficients:
> > Estimate Std. Error z value Pr(>|z|)
> > (Intercept) 6.5742015 1.7767487 3.700 0.000215 ***
> > NestHtZ 0.6205444 0.2484583 2.498 0.012504 *
> > MeanAge -0.6018978 0.2983656 -2.017 0.043662 *
> > I(MeanAge^2) 0.0380521 0.0152053 2.503 0.012330 *
> > I(MeanAge^3) -0.0006349 0.0002358 -2.693 0.007091 **
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > (Dispersion parameter for binomial family taken to be 1)
> >
> > Null deviance: 174.86 on 136 degrees of freedom
> > Residual deviance: 233.82 on 132 degrees of freedom
> > AIC: 243.82
> >
> > Number of Fisher Scoring iterations: 13
> >
> >> glm.24.pred<-predict(glm.24,newdata=nestday, type="response", SE.fit=T)
> > Warning message:
> > longer object length
> > is not a multiple of shorter object length in: plogis(eta)^days
> >
> >
> > Can anyone tell me what I'm doing wrong?
> >
> > cheers, Jessi Brown
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
More information about the R-help
mailing list