[R] Predicted values from glm() when linear predictor is NA.
David Winsemius
dw|n@em|u@ @end|ng |rom comc@@t@net
Thu Jul 28 03:25:23 CEST 2022
On 7/27/22 17:26, Rolf Turner wrote:
> I have a data frame with a numeric ("TrtTime") and a categorical
> ("Lifestage") predictor.
>
> Level "L1" of Lifestage occurs only with a single value of TrtTime,
> explicitly 12, whence it is not possible to estimate a TrtTime "slope"
> when Lifestage is "L1".
>
> Indeed, when I fitted the model
>
> fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
> data=demoDat)
>
> I got:
>
>> as.matrix(coef(fit))
>> [,1]
>> (Intercept) -0.91718302
>> TrtTime 0.88846195
>> LifestageEgg + L1 -45.36420974
>> LifestageL1 14.27570572
>> LifestageL1 + L2 -0.30332697
>> LifestageL3 -3.58672631
>> TrtTime:LifestageEgg + L1 8.10482459
>> TrtTime:LifestageL1 NA
>> TrtTime:LifestageL1 + L2 0.05662651
>> TrtTime:LifestageL3 1.66743472
> That is, TrtTime:LifestageL1 is NA, as expected.
>
> I would have thought that fitted or predicted values corresponding to
> Lifestage = "L1" would thereby be NA, but this is not the case:
>
>> predict(fit)[demoDat$Lifestage=="L1"]
>> 26 65 131
>> 24.02007 24.02007 24.02007
>>
>> fitted(fit)[demoDat$Lifestage=="L1"]
>> 26 65 131
>> 1 1 1
> That is, the predicted values on the scale of the linear predictor are
> large and positive, rather than being NA.
>
> What this amounts to, it seems to me, is saying that if the linear
> predictor in a Binomial glm is NA, then "success" is a certainty.
> This strikes me as being a dubious proposition. My gut feeling is that
> misleading results could be produced.
The NA is most likely caused by aliasing, so some other combination of
factors a perfect surrogate for every case with that level of the
interaction. The `predict.glm` function always requires a complete set
of values to construct a case. Whether apparent incremental linear
prediction of that interaction term is large or small will depend on the
degree of independent contribution of the surrogate levels of other
variables..
David.
>
> Can anyone explain to me a rationale for this behaviour pattern?
> Is there some justification for it that I am not currently seeing?
> Any other comments? (Please omit comments to the effect of "You are as
> thick as two short planks!". :-) )
>
> I have attached the example data set in a file "demoDat.txt", should
> anyone want to experiment with it. The file was created using dput() so
> you should access it (if you wish to do so) via something like
>
> demoDat <- dget("demoDat.txt")
>
> Thanks for any enlightenment.
>
> cheers,
>
> Rolf Turner
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list