[Rd] get_all_vars() does not handle rhs matrices in formulae
Thomas J. Leeper
thosjleeper at gmail.com
Thu Mar 30 16:01:42 CEST 2017
Hello again,
It appears that get_all_vars() incorrectly handles model formulae that
use a right-hand side (rhs) matrix. For example, consider these two
substantively identical models:
# model using named variables
mpg <- mtcars$mpg
wt <- mtcars$wt
hp <- mtcars$hp
m1 <- lm(mpg ~ wt + hp)
# model using matrix
y <- mtcars$mpg
x <- cbind(mtcars$wt, mtcars$hp)
m2 <- lm(y ~ x)
For the first, get_all_vars() returns the correct data frame:
str(get_all_vars(m1, .GlobalEnv))
## 'data.frame': 32 obs. of 3 variables:
## $ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
which could, for example, be passed on to predict() just like the
output from model.frame():
str(predict(m1, model.frame(m1)))
## Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
## - attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
str(predict(m1, get_all_vars(m1)))
## Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
## - attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
For the model specified with a rhs matrix, however, get_all_vars()
returns a three-column data frame with the second matrix column added
as an unnamed third column:
str(get_all_vars(m2, .GlobalEnv))
## 'data.frame': 32 obs. of 3 variables:
## $ y : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ x : num 2.62 2.88 2.32 3.21 3.44 ...
## $ NA: num 110 110 93 110 175 105 245 62 95 123 ...
This means attempts to use this data structure in predict() fail:
str(predict(m2, get_all_vars(m2)))
## Error: variable 'x' was fitted with type "nmatrix.2" but type
"numeric" was supplied
The correct structure needs to resemble following in order for that to succeed:
newdat <- data.frame(y = y)
newdat$x <- x
str(newdat)
## 'data.frame': 32 obs. of 2 variables:
## $ y: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ x: num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
str(predict(m2, newdat))
## Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
## - attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
The correct structure is basically what is returned by model.frame()
in cases involving a rhs matrix:
all.equal(newdat, model.frame(m2), check.attributes = FALSE)
## [1] TRUE
The issue seems to be in one of the very last lines of get_all_vars():
x <- setNames(as.data.frame(c(variables, extras), optional = TRUE),
c(varnames, extranames))
This both coerces `variables` to the wrong structure (making a
three-column data frame instead of a two-column data frame) and
therefore misnames the resulting columns. I unfortunately don't know
the most sensible/general way to solve this, otherwise I would submit
a patch. Anyone know how to fix this last line?
Best,
-Thomas
Thomas J. Leeper
http://www.thomasleeper.com
More information about the R-devel
mailing list