[Rd] Model formulas with explicit references
Berry, Charles
ccberry @ending from uc@d@edu
Sat Jul 21 19:02:22 CEST 2018
> On Jul 20, 2018, at 3:05 PM, Lenth, Russell V <russell-lenth using uiowa.edu> wrote:
>
> Dear R-Devel,
>
> I seem to no longer be able to access the bug-reporting system, so am doing this by e-mail.
>
> My report concerns models where variables are explicitly referenced (or is it "dereferenced"?), such as:
>
> cars.lm <- lm(mtcars[[1]] ~ factor(mtcars$cyl) + mtcars[["disp"]])
>
> I have found that it is not possible to predict such models with new data. For example:
>
>> predict(cars.lm, newdata = mtcars[1:5, )
> 1 2 3 4 5 6 7 8 9 10
> 20.37954 20.37954 26.58543 17.70329 14.91157 18.60448 14.91157 25.52859 25.68971 20.17199
> 11 12 13 14 15 16 17 18 19 20
> 20.17199 17.21096 17.21096 17.21096 11.85300 12.18071 12.72688 27.38558 27.46750 27.59312
> 21 22 23 24 25 26 27 28 29 30
> 26.25500 16.05853 16.44085 15.18466 13.81922 27.37738 26.24954 26.93772 15.15735 20.78917
> 31 32
> 16.52278 26.23042
> Warning message:
> 'newdata' had 5 rows but variables found have 32 rows
>
> Instead of returning 5 predictions, it returns the 32 original predicted values. There is a warning message suggesting that something went wrong. This tickled my curiosity, and hance this result:
>
>> predict(cars.lm, newdata = data.frame(x = 1:32))
> 1 2 3 4 5 6 7 8 9 10
> 20.37954 20.37954 26.58543 17.70329 14.91157 18.60448 14.91157 25.52859 25.68971 20.17199
> 11 12 13 14 15 16 17 18 19 20
> 20.17199 17.21096 17.21096 17.21096 11.85300 12.18071 12.72688 27.38558 27.46750 27.59312
> 21 22 23 24 25 26 27 28 29 30
> 26.25500 16.05853 16.44085 15.18466 13.81922 27.37738 26.24954 26.93772 15.15735 20.78917
> 31 32
> 16.52278 26.23042
>
> Again, the new data are ignored, but there is no warning message, because the previous warning was based only on a discrepancy with the number of rows and the number of predictions. Indeed, the new data set makes no sense at all in the context of this model.
>
> At the root of this behavior is the fact that the model.frame function ignores its data argument with such models. So instead of constructing a new frame based on the new data, it just returns the original model frame.
>
This produces what I think you intended:
> predict(cars.lm, newdata = list(mtcars=mtcars[1:5,]) )
1 2 3 4 5
20.37954 20.37954 26.58543 17.70329 14.91157
>
> I am not really suggesting that you try to make these things work with models when the formula is like this. Instead, I am hoping that it throws an actual error message rather than just a warning, and that you be a little bit more sophisticated than merely checking the number of rows. Both predict() with newdata provided, and model.frame() with a data argument, should return an informative error message that says that model formulas like this are not supported with new data.
As you can see they are supported, but you have to make sure that the objects in the formula can be found in the newdata arg. If this is puzzling, try
debugonce(predict.lm)
predict(cars.lm, newdata = mtcars[1:5, )
and inspect the newdata object and terms(object). You should see why the terms in the formula are not found in newdata.
If you think that something like your idiom for formula is required, maybe you should repost on r-help and say what you are trying to do with it. I expect you'll get some advice on how to reformulate your call.
HTH,
Chuck
More information about the R-devel
mailing list