[R-sig-ME] General Mispredictions of New Levels in nlme Package Models

biii m@iii@g oii de@@ey@ws biii m@iii@g oii de@@ey@ws
Sat Aug 14 13:56:47 CEST 2021


Hello,

 

TLDR:  When given the same number of levels of a factor, it appears that all
nlme contrast handling doesn't check the levels themselves-just the number
of levels.  So, if levels c("A", "B") are fit, and a prediction occurs on
levels c("C", "D"), the prediction will assign predictions for "C" to "A"
and "D" to "B".  I believe that nlme should be checking the levels of
contrasts and not just the number of contrasts.

 

More info:

In a way that is somewhat related to my question from last week about
handling levels in nlme
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2021q3/029663.html), I
have a bigger concern that came back to mind while working through those
issues.  I reported the below issue with contrast handling to Bugzilla a few
years ago (https://bugs.r-project.org/show_bug.cgi?id=17228), but I now
realize that it is more generalized within nlme.

 

I have illustrated this within gnls() and lme() below.  From the r-sig-mixed
thread linked above, it also applies to nlme().  So, I assume that it
applies to all contrast handling within nlme.

 

I assume that the fix relates to checking the names and order of contrasts,
but as I tried to follow the code within the package for it, I got a bit
lost along the way.  What is the best way to cause this to issue an error or
to have it do more detailed contrast level checking?

 

Thanks,

 

Bill

 

``` r

library(nlme)

d.mod <- mtcars

d.mod$gear <- factor(d.mod$gear)

 

mymod <-

  gnls(mpg~e.gear*disp,

       data=d.mod,

       params=e.gear~gear,

       start=rep(0.1, nlevels(d.mod$gear)))

summary(mymod)

#> Generalized nonlinear least squares fit

#>   Model: mpg ~ e.gear * disp 

#>   Data: d.mod 

#>        AIC      BIC    logLik

#>   248.5394 254.4023 -120.2697

#> 

#> Coefficients:

#>                         Value   Std.Error  t-value p-value

#> e.gear.(Intercept) 0.04386917 0.008303409 5.283272  0.0000

#> e.gear.gear4       0.12854341 0.025849086 4.972842  0.0000

#> e.gear.gear5       0.02942924 0.022995432 1.279787  0.2108

#> 

#>  Correlation: 

#>              e..(I) e.gr.4

#> e.gear.gear4 -0.321       

#> e.gear.gear5 -0.361  0.116

#> 

#> Standardized residuals:

#>        Min         Q1        Med         Q3        Max 

#> -1.0180679 -0.4677616  0.1612978  0.8554800  2.1495935 

#> 

#> Residual standard error: 10.89942 

#> Degrees of freedom: 32 total; 29 residual

 

# Fails "contrasts can be applied only to factors with 2 or more levels"

predict(mymod, newdata=data.frame(disp=100, gear=factor("3")))

#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]):
contrasts can be applied only to factors with 2 or more levels

# Fails "Error in p %*% beta[pmap[[nm]]] : non-conformable arguments"

predict(mymod, newdata=data.frame(disp=100, gear=factor(c("3", "4"))))

#> Error in p %*% beta[pmap[[nm]]]: non-conformable arguments

# Succeeds

predict(mymod, newdata=data.frame(disp=100, gear=factor(c("3", "4", "5"))))

#>         1         2         3 

#>  4.386917 17.241258  7.329842 

#> attr(,"label")

#> [1] "Predicted values"

# Should fail but does not!

predict(mymod, newdata=data.frame(disp=100, gear=factor(c("6", "7", "8"))))

#>         1         2         3 

#>  4.386917 17.241258  7.329842 

#> attr(,"label")

#> [1] "Predicted values"

 

# Unique levels are "Male" and "Female" for 'Sex'

unique(Orthodont$Sex)

#> [1] Male   Female

#> Levels: Male Female

fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)

summary(fm2)

#> Linear mixed-effects model fit by REML

#>   Data: Orthodont 

#>        AIC      BIC    logLik

#>   447.5125 460.7823 -218.7563

#> 

#> Random effects:

#>  Formula: ~1 | Subject

#>         (Intercept) Residual

#> StdDev:    1.807425 1.431592

#> 

#> Fixed effects:  distance ~ age + Sex 

#>                 Value Std.Error DF   t-value p-value

#> (Intercept) 17.706713 0.8339225 80 21.233044  0.0000

#> age          0.660185 0.0616059 80 10.716263  0.0000

#> SexFemale   -2.321023 0.7614168 25 -3.048294  0.0054

#>  Correlation: 

#>           (Intr) age   

#> age       -0.813       

#> SexFemale -0.372  0.000

#> 

#> Standardized Within-Group Residuals:

#>         Min          Q1         Med          Q3         Max 

#> -3.74889609 -0.55034466 -0.02516628  0.45341781  3.65746539 

#> 

#> Number of Observations: 108

#> Number of Groups: 27

predict(fm2, newdata=data.frame(age=1:10, Sex="non-binary"), level=0)

#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]):
contrasts can be applied only to factors with 2 or more levels

# Should fail because the levels of Sex are not the same as the original 

predict(fm2, newdata=data.frame(age=1:10, Sex=c("trans", "non-binary")),
level=0)

#>  [1] 16.04588 19.02708 17.36625 20.34745 18.68662 21.66782 20.00699
22.98819

#>  [9] 21.32736 24.30856

#> attr(,"label")

#> [1] "Predicted values"

```

 

<sup>Created on 2021-08-14 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.0)</sup>


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list