[R] Formulae with factors that have missing values

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Oct 6 09:51:14 CEST 2000

On Fri, 6 Oct 2000, Rachel Merriman wrote:

> Hi All,
> I have a formula which has a factor with NAs in it.  I wish to keep
> these in the model matrix, but the NA information is currently lost (the
> rows are kept but the NA gets converted to 0).  Any ideas as to how
> I can keep NAs in?
> e.g.
> junk <-
> factor(c("hi",NA,"low","low","hi","low","hi","hi","low",NA,"hi","low","hi","hi","low","hi"))
> y <- c(1,2,1,2,2,2,1,2,1,1,2,2,1,1,1,2)
> na.keep <- function(X){X}
> myfn <- function (formula,data=sys.parent()){
>     mf <- match.call()
>     mf[[1]] <- as.name("model.frame")
>     if(is.null(mf$na.action)) mf$na.action <- as.name("na.keep")
>     mf <- eval(mf, sys.frame(sys.parent()))
>     Y <- model.extract(mf,response)
>     Terms <- attr(mf,"terms")
>     X <- model.matrix(Terms,mf)
>     X
> }
> myfn(y~junk)

On my system NA gets converted to 1.200089e-306, not 0.
That looks like a bug, and S gives NA.

The question is, what do you want NA to be represented as in the
model matrix?  If you want NA to be another level of the factor,

junk <- factor(c("hi",NA,"low","low","hi","low","hi","hi","low",NA,
"hi","low","hi","hi", "low","hi"), exclude="")

Alternatively, you might want junk to be coded, and all the columns of the
coding set to NA.  The simplest way to get that is

contrasts(junk)[junk,, drop=F]

until we fix the bug.

As a matter of interest, what are you going to do with a model matrix
with NAs in?

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list