[Rd] inconsistent handling of factor, character, and logical predictors in lm()
Fox, John
j|ox @end|ng |rom mcm@@ter@c@
Fri Aug 30 20:11:29 CEST 2019
Dear R-devel list members,
I've discovered an inconsistency in how lm() and similar functions handle logical predictors as opposed to factor or character predictors. An "lm" object for a model that includes factor or character predictors includes the levels of a factor or unique values of a character predictor in the $xlevels component of the object, but not the FALSE/TRUE values for a logical predictor even though the latter is treated as a factor in the fit.
For example:
------------ snip --------------
> m1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris)
> m1$xlevels
$Species
[1] "setosa" "versicolor" "virginica"
> m2 <- lm(Sepal.Length ~ Sepal.Width + as.character(Species), data=iris)
> m2$xlevels
$`as.character(Species)`
[1] "setosa" "versicolor" "virginica"
> m3 <- lm(Sepal.Length ~ Sepal.Width + I(Species == "setosa"), data=iris)
> m3$xlevels
named list()
> m3
Call:
lm(formula = Sepal.Length ~ Sepal.Width + I(Species == "setosa"),
data = iris)
Coefficients:
(Intercept) Sepal.Width I(Species == "setosa")TRUE
3.5571 0.9418 -1.7797
------------ snip --------------
I believe that the culprit is .getXlevels(), which makes provision for factor and character predictors but not for logical predictors:
------------ snip --------------
> .getXlevels
function (Terms, m)
{
xvars <- vapply(attr(Terms, "variables"), deparse2,
"")[-1L]
if ((yvar <- attr(Terms, "response")) > 0)
xvars <- xvars[-yvar]
if (length(xvars)) {
xlev <- lapply(m[xvars], function(x) if (is.factor(x))
levels(x)
else if (is.character(x))
levels(as.factor(x)))
xlev[!vapply(xlev, is.null, NA)]
}
}
------------ snip --------------
It would be simple to modify the last test in .getXlevels to
else if (is.character(x) || is.logical(x))
which would cause .getXlevels() to return c("FALSE", "TRUE") (assuming both values are present in the data). I'd find that sufficient, but alternatively there could be a separate test for logical predictors that returns c(FALSE, TRUE).
I discovered this issue when a function in the effects package failed for a model with a logical predictor. Although it's possible to program around the problem, I think that it would be better to handle factors, character predictors, and logical predictors consistently.
Best,
John
--------------------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socialsciences.mcmaster.ca/jfox/
More information about the R-devel
mailing list