[R-sig-ME] General Mispredictions of New Levels in nlme Package Models
biii m@iii@g oii de@@ey@ws
biii m@iii@g oii de@@ey@ws
Sat Aug 14 13:56:47 CEST 2021
Hello,
TLDR: When given the same number of levels of a factor, it appears that all
nlme contrast handling doesn't check the levels themselves-just the number
of levels. So, if levels c("A", "B") are fit, and a prediction occurs on
levels c("C", "D"), the prediction will assign predictions for "C" to "A"
and "D" to "B". I believe that nlme should be checking the levels of
contrasts and not just the number of contrasts.
More info:
In a way that is somewhat related to my question from last week about
handling levels in nlme
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2021q3/029663.html), I
have a bigger concern that came back to mind while working through those
issues. I reported the below issue with contrast handling to Bugzilla a few
years ago (https://bugs.r-project.org/show_bug.cgi?id=17228), but I now
realize that it is more generalized within nlme.
I have illustrated this within gnls() and lme() below. From the r-sig-mixed
thread linked above, it also applies to nlme(). So, I assume that it
applies to all contrast handling within nlme.
I assume that the fix relates to checking the names and order of contrasts,
but as I tried to follow the code within the package for it, I got a bit
lost along the way. What is the best way to cause this to issue an error or
to have it do more detailed contrast level checking?
Thanks,
Bill
``` r
library(nlme)
d.mod <- mtcars
d.mod$gear <- factor(d.mod$gear)
mymod <-
gnls(mpg~e.gear*disp,
data=d.mod,
params=e.gear~gear,
start=rep(0.1, nlevels(d.mod$gear)))
summary(mymod)
#> Generalized nonlinear least squares fit
#> Model: mpg ~ e.gear * disp
#> Data: d.mod
#> AIC BIC logLik
#> 248.5394 254.4023 -120.2697
#>
#> Coefficients:
#> Value Std.Error t-value p-value
#> e.gear.(Intercept) 0.04386917 0.008303409 5.283272 0.0000
#> e.gear.gear4 0.12854341 0.025849086 4.972842 0.0000
#> e.gear.gear5 0.02942924 0.022995432 1.279787 0.2108
#>
#> Correlation:
#> e..(I) e.gr.4
#> e.gear.gear4 -0.321
#> e.gear.gear5 -0.361 0.116
#>
#> Standardized residuals:
#> Min Q1 Med Q3 Max
#> -1.0180679 -0.4677616 0.1612978 0.8554800 2.1495935
#>
#> Residual standard error: 10.89942
#> Degrees of freedom: 32 total; 29 residual
# Fails "contrasts can be applied only to factors with 2 or more levels"
predict(mymod, newdata=data.frame(disp=100, gear=factor("3")))
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]):
contrasts can be applied only to factors with 2 or more levels
# Fails "Error in p %*% beta[pmap[[nm]]] : non-conformable arguments"
predict(mymod, newdata=data.frame(disp=100, gear=factor(c("3", "4"))))
#> Error in p %*% beta[pmap[[nm]]]: non-conformable arguments
# Succeeds
predict(mymod, newdata=data.frame(disp=100, gear=factor(c("3", "4", "5"))))
#> 1 2 3
#> 4.386917 17.241258 7.329842
#> attr(,"label")
#> [1] "Predicted values"
# Should fail but does not!
predict(mymod, newdata=data.frame(disp=100, gear=factor(c("6", "7", "8"))))
#> 1 2 3
#> 4.386917 17.241258 7.329842
#> attr(,"label")
#> [1] "Predicted values"
# Unique levels are "Male" and "Female" for 'Sex'
unique(Orthodont$Sex)
#> [1] Male Female
#> Levels: Male Female
fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
summary(fm2)
#> Linear mixed-effects model fit by REML
#> Data: Orthodont
#> AIC BIC logLik
#> 447.5125 460.7823 -218.7563
#>
#> Random effects:
#> Formula: ~1 | Subject
#> (Intercept) Residual
#> StdDev: 1.807425 1.431592
#>
#> Fixed effects: distance ~ age + Sex
#> Value Std.Error DF t-value p-value
#> (Intercept) 17.706713 0.8339225 80 21.233044 0.0000
#> age 0.660185 0.0616059 80 10.716263 0.0000
#> SexFemale -2.321023 0.7614168 25 -3.048294 0.0054
#> Correlation:
#> (Intr) age
#> age -0.813
#> SexFemale -0.372 0.000
#>
#> Standardized Within-Group Residuals:
#> Min Q1 Med Q3 Max
#> -3.74889609 -0.55034466 -0.02516628 0.45341781 3.65746539
#>
#> Number of Observations: 108
#> Number of Groups: 27
predict(fm2, newdata=data.frame(age=1:10, Sex="non-binary"), level=0)
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]):
contrasts can be applied only to factors with 2 or more levels
# Should fail because the levels of Sex are not the same as the original
predict(fm2, newdata=data.frame(age=1:10, Sex=c("trans", "non-binary")),
level=0)
#> [1] 16.04588 19.02708 17.36625 20.34745 18.68662 21.66782 20.00699
22.98819
#> [9] 21.32736 24.30856
#> attr(,"label")
#> [1] "Predicted values"
```
<sup>Created on 2021-08-14 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.0)</sup>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list