[Rd] Apparent bug in behavior of formulas with '-' operator for lm

Mark van der Loo mark.vanderloo at gmail.com
Fri Mar 16 13:09:08 CET 2018


Joris, the point is that 'z' is NOT used as a predictor in the model.
Therefore it should not affect predictions. Also, I find it suspicious that
the error only occurs when the response variable conitains missings and 'z'
is unique (I have tested several other cases to confirm this).

-Mark

Op vr 16 mrt. 2018 om 13:03 schreef Joris Meys <jorismeys at gmail.com>:

> It's not a bug per se. It's the effect of removing all observations linked
> to a certain level in your data frame. So the output of lm() doesn't
> contain a coefficient for level a of z, but your new data contains that
> level a. With a small addition, this works again:
>
> d <- data.frame(x=rnorm(12),y=rnorm(12),z=rep(letters[1:6],2))
>
> d$x[1] <- NA
> m <- lm(x ~ . -z, data=d)
> p <- predict(m, newdata=d)
>
> This is linked to another discussion earlier on stackoverflow :
> https://stackoverflow.com/questions/48461980/prediction-in-r-glmm
> which lead to an update to lme4 : https://github.com/lme4/lme4/issues/452
>
> The point being that factors in your newdata should have the same levels
> as factors in the original data that was used to fit the model. If you add
> levels to these factors, it's impossible to use that model to predict for
> these new data.
>
> Cheers
> Joris
>
> On Fri, Mar 16, 2018 at 10:21 AM, Mark van der Loo <
> mark.vanderloo at gmail.com> wrote:
>
>> Dear R-developers,
>>
>> In the 'lm' documentation, the '-' operator is only specified to be used
>> with -1 (to remove the intercept from the model).
>>
>> However, the documentation also refers to the 'formula' help file, which
>> indicates that it is possible to subtract any term. Indeed, the following
>> works with no problems (the period '.' stands for 'all terms except the
>> lhs'):
>>
>> d <- data.frame(x=rnorm(6), y=rnorm(6), z=letters[1:2])
>> m <- lm(x ~ . -z, data=d)
>> p <- predict(m,newdata=d)
>>
>> Now, if I change 'z' so that it has only unique values, and I introduce an
>> NA in the predicted variable, the following happens:
>>
>> d <- data.frame(x=rnorm(6),y=rnorm(6),z=letters[1:6])
>> d$x[1] <- NA
>> m <- lm(x ~ . -z, data=d)
>> p <- predict(m, newdata=d)
>> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
>> object$xlevels) : factor z has new levels a
>>
>> It seems a bug to me, although one could argue that 'lm's documentation
>> does not allow one to expect that the '-' operator should work generally.
>>
>> If it is a bug I'm happy to report it to bugzilla.
>>
>> Thanks for all your efforts,
>> Mark
>>
>> ps: I was not able to test this on R3.4.4 yet, but the NEWS does not
>> mention fixes related to 'lm' or 'predict'.
>>
>>
>> > sessionInfo()
>> R version 3.4.3 (2017-11-30)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.4 LTS
>>
>> Matrix products: default
>> BLAS: /usr/lib/libblas/libblas.so.3.6.0
>> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
>> LC_PAPER=nl_NL.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>  LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.4.3 tools_3.4.3    yaml_2.1.16
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>
> -----------
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list