Formulae with factors that have missing values
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Oct 6 09:51:14 CEST 2000
On Fri, 6 Oct 2000, Rachel Merriman wrote:
> Hi All,
> I have a formula which has a factor with NAs in it. I wish to keep
> these in the model matrix, but the NA information is currently lost (the
> rows are kept but the NA gets converted to 0). Any ideas as to how
> I can keep NAs in?
> e.g.
> junk <-
> factor(c("hi",NA,"low","low","hi","low","hi","hi","low",NA,"hi","low","hi","hi","low","hi"))
> y <- c(1,2,1,2,2,2,1,2,1,1,2,2,1,1,1,2)
> na.keep <- function(X){X}
> myfn <- function (formula,data=sys.parent()){
> mf <- match.call()
> mf[[1]] <- as.name("model.frame")
> if(is.null(mf$na.action)) mf$na.action <- as.name("na.keep")
> mf <- eval(mf, sys.frame(sys.parent()))
> Y <- model.extract(mf,response)
> Terms <- attr(mf,"terms")
> X <- model.matrix(Terms,mf)
> X
> }
> myfn(y~junk)
On my system NA gets converted to 1.200089e-306, not 0.
That looks like a bug, and S gives NA.
The question is, what do you want NA to be represented as in the
model matrix? If you want NA to be another level of the factor,
junk <- factor(c("hi",NA,"low","low","hi","low","hi","hi","low",NA,
"hi","low","hi","hi", "low","hi"), exclude="")
Alternatively, you might want junk to be coded, and all the columns of the
coding set to NA. The simplest way to get that is
contrasts(junk)[junk,, drop=F]
until we fix the bug.
As a matter of interest, what are you going to do with a model matrix
with NAs in?
