[R] Predicted values from glm() when linear predictor is NA.

Thu Jul 28 03:25:23 CEST 2022

On 7/27/22 17:26, Rolf Turner wrote:
> I have a data frame with a numeric ("TrtTime") and a categorical
> ("Lifestage") predictor.
>
> Level "L1" of Lifestage occurs only with a single value of TrtTime,
> explicitly 12, whence it is not possible to estimate a TrtTime "slope"
> when Lifestage is "L1".
>
> Indeed, when I fitted the model
>
>      fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
>                 data=demoDat)
>
> I got:
>
>> as.matrix(coef(fit))
>>                                    [,1]
>> (Intercept)                -0.91718302
>> TrtTime                     0.88846195
>> LifestageEgg + L1         -45.36420974
>> LifestageL1                14.27570572
>> LifestageL1 + L2           -0.30332697
>> LifestageL3                -3.58672631
>> TrtTime:LifestageEgg + L1   8.10482459
>> TrtTime:LifestageL1                 NA
>> TrtTime:LifestageL1 + L2    0.05662651
>> TrtTime:LifestageL3         1.66743472
> That is, TrtTime:LifestageL1 is NA, as expected.
>
> I would have thought that fitted or predicted values corresponding to
> Lifestage = "L1" would thereby be NA, but this is not the case:
>
>> predict(fit)[demoDat$Lifestage=="L1"]
>>        26       65      131
>> 24.02007 24.02007 24.02007
>>
>> fitted(fit)[demoDat$Lifestage=="L1"]
>>   26  65 131
>>    1   1   1
> That is, the predicted values on the scale of the linear predictor are
> large and positive, rather than being NA.
>
> What this amounts to, it seems to me, is saying that if the linear
> predictor in a Binomial glm is NA, then "success" is a certainty.
> This strikes me as being a dubious proposition.  My gut feeling is that
> misleading results could be produced.

The NA is most likely caused by aliasing, so some other combination of 
factors a perfect surrogate for every case with that level of the 
interaction. The `predict.glm` function always requires a complete set 
of values to construct a case. Whether apparent incremental linear 
prediction of that interaction term is large or small will depend on the 
degree of independent contribution of the surrogate levels of other 
variables..

David.

>
> Can anyone explain to me a rationale for this behaviour pattern?
> Is there some justification for it that I am not currently seeing?
> Any other comments?  (Please omit comments to the effect of "You are as
> thick as two short planks!". :-) )
>
> I have attached the example data set in a file "demoDat.txt", should
> anyone want to experiment with it.  The file was created using dput() so
> you should access it (if you wish to do so) via something like
>
>      demoDat <- dget("demoDat.txt")
>
> Thanks for any enlightenment.
>
> cheers,
>
> Rolf Turner
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.